Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU clusterizer with neural networks #13981

Merged
merged 104 commits into from
Mar 14, 2025
Merged
Show file tree
Hide file tree
Changes from 84 commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
d4dc46e
Copying kernels to implement NN clusterizer
ChSonnabend May 16, 2024
c191885
Merge branch 'dev' into gpu_clusterizer
ChSonnabend May 24, 2024
05831ef
First version of clusterizer in GPU code
ChSonnabend May 27, 2024
8515290
Merge branch 'gpu_clusterizer' of github.com:ChSonnabend/AliceO2 into…
ChSonnabend May 27, 2024
3f6c934
Adding a compiling and running version with single-threaded ONNX mode…
ChSonnabend May 29, 2024
8ba6805
Clusters now working by a hack
ChSonnabend May 29, 2024
6ec3c46
Working implementation of settings via GPUSettings.h and --configKeyV…
ChSonnabend Jun 6, 2024
626a46f
Merge branch 'AliceO2Group:dev' into gpu_clusterizer
ChSonnabend Jun 24, 2024
ab4653a
Modifying the onnx_interface to include the right headers
ChSonnabend Jun 24, 2024
04084c8
Adjusting initialization for new ONNXRuntime version
ChSonnabend Jun 24, 2024
01dc4a1
Adjusting global settings and CF code for several settings
ChSonnabend Jun 26, 2024
accd7ab
Adding return statement if cluster is rejected
ChSonnabend Jul 3, 2024
019b388
Merge branch 'AliceO2Group:dev' into gpu_clusterizer
ChSonnabend Jul 3, 2024
3473a06
Adding some statements back
ChSonnabend Jul 4, 2024
dfffdf5
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Oct 16, 2024
df21c96
Update to latest status of gpu clusterization
ChSonnabend Oct 17, 2024
06737fd
Fixing uchar -> uint8_t
ChSonnabend Oct 18, 2024
b148449
Adding utils header
ChSonnabend Oct 18, 2024
534da50
Updating kernels.cmake to uint8_t
ChSonnabend Oct 21, 2024
bb2cb6e
Please consider the following formatting changes
alibuild Oct 21, 2024
027e225
Merge pull request #6 from alibuild/alibot-cleanup-13610
ChSonnabend Nov 4, 2024
25093b3
Adding an ONNX CPU library in the O2 framework
ChSonnabend Nov 18, 2024
74cf0e7
Merge branch 'AliceO2Group:dev' into onnxruntime-cpu
ChSonnabend Nov 18, 2024
9232328
Please consider the following formatting changes
alibuild Nov 18, 2024
9a6a9e8
Merge pull request #7 from alibuild/alibot-cleanup-13709
ChSonnabend Nov 18, 2024
7251c5c
Fixing macOS build issues with calling O*.data()
ChSonnabend Nov 19, 2024
d0f4dd8
Fixing compiler issues and char -> uint8_t
ChSonnabend Nov 19, 2024
7859ab2
Fixing curly braces
ChSonnabend Nov 19, 2024
c6cb3e6
Fixing std::make_shared
ChSonnabend Nov 19, 2024
55621f0
Merge branch 'onnxruntime-cpu' into gpu_clusterizer
ChSonnabend Nov 20, 2024
a00a54b
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Nov 20, 2024
40bc437
Changing order for <CommonUtils/StringUtils.h>
ChSonnabend Nov 20, 2024
f0a8cc2
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Nov 22, 2024
d3aede4
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Dec 17, 2024
52b033f
Bug-fixing file name
ChSonnabend Dec 17, 2024
314a0ce
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Jan 17, 2025
684eb56
Making NN clusterizer more efficient
ChSonnabend Feb 6, 2025
9bd1ce4
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Feb 7, 2025
639b895
Changing constexpr
ChSonnabend Feb 7, 2025
3c4c587
Fixing build issues
ChSonnabend Feb 7, 2025
95bb2ff
Major changes to make clusterizer parallelizable. Problem remains: di…
ChSonnabend Feb 17, 2025
857f27d
Adjusting for default CF regression
ChSonnabend Feb 19, 2025
89c0105
Bug-fix for application of CF regression and logging message
ChSonnabend Feb 20, 2025
45d8071
Adding is_boundary check earlier to avoid out-of-bounds access
ChSonnabend Feb 22, 2025
984857e
Bug-fixes for boundary reading
ChSonnabend Feb 24, 2025
57862a6
Updating to use explicit calls to kernels instead of if-statements
ChSonnabend Feb 25, 2025
c55cfc2
Bug-fix for class label application
ChSonnabend Feb 26, 2025
0125c2a
Explicit casting solves regression issues. To be done: Correct publis…
ChSonnabend Feb 26, 2025
408787d
Bug-fixes
ChSonnabend Feb 26, 2025
e830697
Adding some documentation
ChSonnabend Mar 5, 2025
1ca9fa0
Please consider the following formatting changes
alibuild Mar 5, 2025
815cc30
Modifying for Davids comments
ChSonnabend Mar 5, 2025
0bc4097
Merge pull request #10 from alibuild/alibot-cleanup-13981
ChSonnabend Mar 6, 2025
a478634
Modifications from comments on PR
ChSonnabend Mar 7, 2025
99ca93b
Merge branch 'gpu_clusterizer_2' into gpu_clusterizer
ChSonnabend Mar 7, 2025
db0c836
Please consider the following formatting changes
alibuild Mar 7, 2025
7ebdcb9
Merge pull request #12 from alibuild/alibot-cleanup-13981
ChSonnabend Mar 7, 2025
ff62b9d
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Mar 7, 2025
6c6cb95
iSlice -> iSector
ChSonnabend Mar 7, 2025
490170e
mISlice -> mISector
ChSonnabend Mar 7, 2025
bca1014
Minor bug-fixes
ChSonnabend Mar 7, 2025
b687967
Adjusting for comments
ChSonnabend Mar 8, 2025
70adf1e
Bug-fix for fullCI build
ChSonnabend Mar 8, 2025
06e26a8
Adding GPUd() for on-device functions
ChSonnabend Mar 8, 2025
bedb592
Fixing compile issues, only thing mssing: conversion of float to float16
ChSonnabend Mar 10, 2025
e888298
Let's see if this does the trick
ChSonnabend Mar 10, 2025
21f5694
Making functions (constructors) GPUd() (GPUdDefault())
ChSonnabend Mar 10, 2025
66da84e
GPU kernels should now be findable
ChSonnabend Mar 10, 2025
e8af1c2
Adding ifdefs for standalone build and header exclusions in GPUORTFlo…
ChSonnabend Mar 10, 2025
08753dd
Modifying the approach to not use std:: types. Still needs to be test…
ChSonnabend Mar 10, 2025
9155cca
New version of clusterizer. Compiles locally, but segfaults in fillIn…
ChSonnabend Mar 11, 2025
05bc4b8
Please consider the following formatting changes
alibuild Mar 11, 2025
24bf104
Merge pull request #14 from alibuild/alibot-cleanup-13981
ChSonnabend Mar 11, 2025
ed323ec
Adjust for comments
ChSonnabend Mar 12, 2025
248f9c9
Please consider the following formatting changes
alibuild Mar 12, 2025
f1af003
Merge pull request #15 from alibuild/alibot-cleanup-13981
ChSonnabend Mar 12, 2025
a23fdc9
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Mar 12, 2025
bd3c8d1
Merging dev and adjusting build issues
ChSonnabend Mar 12, 2025
cc6c05c
Adjusting for comments
ChSonnabend Mar 12, 2025
6e809bf
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Mar 12, 2025
80f818d
Fixing incorrect #endif
ChSonnabend Mar 12, 2025
ac61052
Please consider the following formatting changes
alibuild Mar 12, 2025
814d94d
Merge pull request #16 from alibuild/alibot-cleanup-13981
ChSonnabend Mar 12, 2025
c03a60e
Fix indentation, remove duplicate define
davidrohr Mar 13, 2025
207ba9c
Fixing one memory issue. Segfault / memory leak persists
ChSonnabend Mar 13, 2025
c5b147f
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Mar 13, 2025
0978c19
Adjusting for new toNative function
ChSonnabend Mar 13, 2025
ad9696e
Fixing .finalize
ChSonnabend Mar 13, 2025
3377435
Adjusting CMakeLIsts and other bugs
ChSonnabend Mar 13, 2025
9893b43
Adding GPUCA_HAS_ONNX only to tracking
ChSonnabend Mar 13, 2025
bce04bc
Changing to fixed size for number of clusters
ChSonnabend Mar 13, 2025
713dd64
Fixed segfault. Not producing the right number of clusters yet.
ChSonnabend Mar 13, 2025
e66efb1
Network now accepts clusters over all sectors
ChSonnabend Mar 13, 2025
2b9b8da
Whitespaces...
ChSonnabend Mar 13, 2025
34419f3
Merge dev + fix-ups
ChSonnabend Mar 13, 2025
85d185e
Some weird formatting
ChSonnabend Mar 13, 2025
49352ab
Please consider the following formatting changes
alibuild Mar 13, 2025
90ef464
Merge pull request #17 from alibuild/alibot-cleanup-13981
ChSonnabend Mar 13, 2025
78c342d
Removing white-spaces
ChSonnabend Mar 14, 2025
6a7b17c
Adding necessary if-statement to avoid automatic model loading
ChSonnabend Mar 14, 2025
41d80d2
Merge dev and fixes
ChSonnabend Mar 14, 2025
bb163ea
Removing GPUConstantMem, adding interOpNumThreads option
ChSonnabend Mar 14, 2025
eabba5f
Found the bug where I loose clusters
ChSonnabend Mar 14, 2025
1e80754
Editor configured for whitespaces at EOF
ChSonnabend Mar 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 72 additions & 54 deletions Common/ML/include/ML/3rdparty/GPUORTFloat16.h

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion Common/ML/include/ML/OrtInterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ class OrtModel
OrtModel(std::unordered_map<std::string, std::string> optionsMap) { reset(optionsMap); }
void init(std::unordered_map<std::string, std::string> optionsMap) { reset(optionsMap); }
void reset(std::unordered_map<std::string, std::string>);
bool isInitialized() { return mInitialized; }

virtual ~OrtModel() = default;

Expand All @@ -55,6 +56,9 @@ class OrtModel
template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
std::vector<O> inference(std::vector<std::vector<I>>&);

template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. OrtDataType::Float16_t from O2/Common/ML/include/ML/GPUORTFloat16.h
void inference(I*, size_t, O*);

// template<class I, class T, class O> // class I is the input data type, e.g. float, class T the throughput data type and class O is the output data type
// std::vector<O> inference(std::vector<I>&);

Expand All @@ -79,6 +83,7 @@ class OrtModel
std::vector<std::vector<int64_t>> mInputShapes, mOutputShapes;

// Environment settings
bool mInitialized = false;
std::string modelPath, device = "cpu", dtype = "float"; // device options should be cpu, rocm, migraphx, cuda
int intraOpNumThreads = 0, deviceId = 0, enableProfiling = 0, loggingLevel = 0, allocateDeviceMemory = 0, enableOptimizations = 0;

Expand All @@ -89,4 +94,4 @@ class OrtModel

} // namespace o2

#endif // O2_ML_ORTINTERFACE_H
#endif // O2_ML_ORTINTERFACE_H
162 changes: 63 additions & 99 deletions Common/ML/src/OrtInterface.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,19 @@ void OrtModel::reset(std::unordered_map<std::string, std::string> optionsMap)
if (!optionsMap.contains("model-path")) {
LOG(fatal) << "(ORT) Model path cannot be empty!";
}
modelPath = optionsMap["model-path"];
device = (optionsMap.contains("device") ? optionsMap["device"] : "CPU");
dtype = (optionsMap.contains("dtype") ? optionsMap["dtype"] : "float");
deviceId = (optionsMap.contains("device-id") ? std::stoi(optionsMap["device-id"]) : 0);
allocateDeviceMemory = (optionsMap.contains("allocate-device-memory") ? std::stoi(optionsMap["allocate-device-memory"]) : 0);
intraOpNumThreads = (optionsMap.contains("intra-op-num-threads") ? std::stoi(optionsMap["intra-op-num-threads"]) : 0);
loggingLevel = (optionsMap.contains("logging-level") ? std::stoi(optionsMap["logging-level"]) : 2);
enableProfiling = (optionsMap.contains("enable-profiling") ? std::stoi(optionsMap["enable-profiling"]) : 0);
enableOptimizations = (optionsMap.contains("enable-optimizations") ? std::stoi(optionsMap["enable-optimizations"]) : 0);

std::string dev_mem_str = "Hip";

if (!optionsMap["model-path"].empty()) {
modelPath = optionsMap["model-path"];
device = (optionsMap.contains("device") ? optionsMap["device"] : "CPU");
dtype = (optionsMap.contains("dtype") ? optionsMap["dtype"] : "float");
deviceId = (optionsMap.contains("device-id") ? std::stoi(optionsMap["device-id"]) : 0);
allocateDeviceMemory = (optionsMap.contains("allocate-device-memory") ? std::stoi(optionsMap["allocate-device-memory"]) : 0);
intraOpNumThreads = (optionsMap.contains("intra-op-num-threads") ? std::stoi(optionsMap["intra-op-num-threads"]) : 0);
loggingLevel = (optionsMap.contains("logging-level") ? std::stoi(optionsMap["logging-level"]) : 0);
enableProfiling = (optionsMap.contains("enable-profiling") ? std::stoi(optionsMap["enable-profiling"]) : 0);
enableOptimizations = (optionsMap.contains("enable-optimizations") ? std::stoi(optionsMap["enable-optimizations"]) : 0);

std::string dev_mem_str = "Hip";
#if defined(ORT_ROCM_BUILD)
#if ORT_ROCM_BUILD == 1
if (device == "ROCM") {
Expand Down Expand Up @@ -93,7 +95,9 @@ void OrtModel::reset(std::unordered_map<std::string, std::string> optionsMap)
} else if (intraOpNumThreads == 1) {
(pImplOrt->sessionOptions).SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL);
}
LOG(info) << "(ORT) CPU execution provider set with " << intraOpNumThreads << " threads";
if (loggingLevel < 2) {
LOG(info) << "(ORT) CPU execution provider set with " << intraOpNumThreads << " threads";
}
}

(pImplOrt->sessionOptions).DisableMemPattern();
Expand All @@ -109,6 +113,9 @@ void OrtModel::reset(std::unordered_map<std::string, std::string> optionsMap)
} else {
(pImplOrt->sessionOptions).DisableProfiling();
}

mInitialized = true;

(pImplOrt->sessionOptions).SetGraphOptimizationLevel(GraphOptimizationLevel(enableOptimizations));
(pImplOrt->sessionOptions).SetLogSeverityLevel(OrtLoggingLevel(loggingLevel));

Expand Down Expand Up @@ -154,16 +161,9 @@ void OrtModel::reset(std::unordered_map<std::string, std::string> optionsMap)
outputNamesChar.resize(mOutputNames.size(), nullptr);
std::transform(std::begin(mOutputNames), std::end(mOutputNames), std::begin(outputNamesChar),
[&](const std::string& str) { return str.c_str(); });

// Print names
LOG(info) << "\tInput Nodes:";
for (size_t i = 0; i < mInputNames.size(); i++) {
LOG(info) << "\t\t" << mInputNames[i] << " : " << printShape(mInputShapes[i]);
}

LOG(info) << "\tOutput Nodes:";
for (size_t i = 0; i < mOutputNames.size(); i++) {
LOG(info) << "\t\t" << mOutputNames[i] << " : " << printShape(mOutputShapes[i]);
if (loggingLevel < 2) {
LOG(info) << "(ORT) Model loaded successfully! (input: " << printShape(mInputShapes[0]) << ", output: " << printShape(mOutputShapes[0]) << ")";
}
}

Expand All @@ -187,36 +187,6 @@ std::vector<O> OrtModel::v2v(std::vector<I>& input, bool clearInput)
}
}

template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
std::vector<O> OrtModel::inference(std::vector<I>& input)
{
std::vector<int64_t> inputShape{(int64_t)(input.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
std::vector<Ort::Value> inputTensor;
inputTensor.emplace_back(Ort::Value::CreateTensor<O>(pImplOrt->memoryInfo, reinterpret_cast<O*>(input.data()), input.size(), inputShape.data(), inputShape.size()));
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
O* outputValues = reinterpret_cast<O*>(outputTensors[0].template GetTensorMutableData<O>());
std::vector<O> outputValuesVec{outputValues, outputValues + inputShape[0] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
}

template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
std::vector<O> OrtModel::inference(std::vector<std::vector<I>>& input)
{
std::vector<Ort::Value> inputTensor;
for (auto i : input) {
std::vector<int64_t> inputShape{(int64_t)(i.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
inputTensor.emplace_back(Ort::Value::CreateTensor<O>(pImplOrt->memoryInfo, reinterpret_cast<O*>(i.data()), i.size(), inputShape.data(), inputShape.size()));
}
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
O* outputValues = reinterpret_cast<O*>(outputTensors[0].template GetTensorMutableData<O>());
std::vector<O> outputValuesVec{outputValues, outputValues + inputTensor.size() / mInputShapes[0][1] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
}

std::string OrtModel::printShape(const std::vector<int64_t>& v)
{
std::stringstream ss("");
Expand All @@ -227,78 +197,72 @@ std::string OrtModel::printShape(const std::vector<int64_t>& v)
return ss.str();
}

template <>
std::vector<float> OrtModel::inference<float, float>(std::vector<float>& input)
template <class I, class O>
std::vector<O> OrtModel::inference(std::vector<I>& input)
{
std::vector<int64_t> inputShape{(int64_t)(input.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
std::vector<Ort::Value> inputTensor;
inputTensor.emplace_back(Ort::Value::CreateTensor<float>(pImplOrt->memoryInfo, input.data(), input.size(), inputShape.data(), inputShape.size()));
if constexpr (std::is_same_v<I, OrtDataType::Float16_t>) {
inputTensor.emplace_back(Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(input.data()), input.size(), inputShape.data(), inputShape.size()));
} else {
inputTensor.emplace_back(Ort::Value::CreateTensor<I>(pImplOrt->memoryInfo, input.data(), input.size(), inputShape.data(), inputShape.size()));
}
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
float* outputValues = outputTensors[0].template GetTensorMutableData<float>();
std::vector<float> outputValuesVec{outputValues, outputValues + inputShape[0] * mOutputShapes[0][1]};
O* outputValues = outputTensors[0].template GetTensorMutableData<O>();
std::vector<O> outputValuesVec{outputValues, outputValues + inputShape[0] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
}

template <>
std::vector<float> OrtModel::inference<OrtDataType::Float16_t, float>(std::vector<OrtDataType::Float16_t>& input)
{
std::vector<int64_t> inputShape{(int64_t)(input.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
std::vector<Ort::Value> inputTensor;
inputTensor.emplace_back(Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(input.data()), input.size(), inputShape.data(), inputShape.size()));
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
float* outputValues = outputTensors[0].template GetTensorMutableData<float>();
std::vector<float> outputValuesVec{outputValues, outputValues + inputShape[0] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
}
template std::vector<float> OrtModel::inference<float, float>(std::vector<float>&);

template <>
std::vector<OrtDataType::Float16_t> OrtModel::inference<OrtDataType::Float16_t, OrtDataType::Float16_t>(std::vector<OrtDataType::Float16_t>& input)
{
std::vector<int64_t> inputShape{(int64_t)(input.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
std::vector<Ort::Value> inputTensor;
inputTensor.emplace_back(Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(input.data()), input.size(), inputShape.data(), inputShape.size()));
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
OrtDataType::Float16_t* outputValues = reinterpret_cast<OrtDataType::Float16_t*>(outputTensors[0].template GetTensorMutableData<Ort::Float16_t>());
std::vector<OrtDataType::Float16_t> outputValuesVec{outputValues, outputValues + inputShape[0] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
}
template std::vector<float> OrtModel::inference<OrtDataType::Float16_t, float>(std::vector<OrtDataType::Float16_t>&);

template <>
std::vector<OrtDataType::Float16_t> OrtModel::inference<float, OrtDataType::Float16_t>(std::vector<float>& input)
template std::vector<OrtDataType::Float16_t> OrtModel::inference<OrtDataType::Float16_t, OrtDataType::Float16_t>(std::vector<OrtDataType::Float16_t>&);

template <class I, class O>
void OrtModel::inference(I* input, size_t input_size, O* output)
{
std::vector<int64_t> inputShape{(int64_t)(input.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
std::vector<Ort::Value> inputTensor;
inputTensor.emplace_back(Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(input.data()), input.size(), inputShape.data(), inputShape.size()));
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
OrtDataType::Float16_t* outputValues = reinterpret_cast<OrtDataType::Float16_t*>(outputTensors[0].template GetTensorMutableData<Ort::Float16_t>());
std::vector<OrtDataType::Float16_t> outputValuesVec{outputValues, outputValues + inputShape[0] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
std::vector<int64_t> inputShape{(int64_t)(input_size / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
Ort::Value inputTensor = Ort::Value(nullptr);
if constexpr (std::is_same_v<I, OrtDataType::Float16_t>) {
inputTensor = Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(input), input_size, inputShape.data(), inputShape.size());
} else {
inputTensor = Ort::Value::CreateTensor<I>(pImplOrt->memoryInfo, input, input_size, inputShape.data(), inputShape.size());
}

std::vector<int64_t> outputShape{inputShape[0], mOutputShapes[0][1]};
size_t outputSize = (int64_t)(inputShape[0] * mOutputShapes[0][1]);
Ort::Value outputTensor = Ort::Value::CreateTensor<O>(pImplOrt->memoryInfo, output, outputSize, outputShape.data(), outputShape.size());

(pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), &inputTensor, 1, outputNamesChar.data(), &outputTensor, 1);
}

template <>
std::vector<OrtDataType::Float16_t> OrtModel::inference<OrtDataType::Float16_t, OrtDataType::Float16_t>(std::vector<std::vector<OrtDataType::Float16_t>>& input)
template void OrtModel::inference<OrtDataType::Float16_t, float>(OrtDataType::Float16_t*, size_t, float*);

template void OrtModel::inference<float, float>(float*, size_t, float*);

template <class I, class O>
std::vector<O> OrtModel::inference(std::vector<std::vector<I>>& input)
{
std::vector<Ort::Value> inputTensor;
for (auto i : input) {
std::vector<int64_t> inputShape{(int64_t)(i.size() / mInputShapes[0][1]), (int64_t)mInputShapes[0][1]};
inputTensor.emplace_back(Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(i.data()), i.size(), inputShape.data(), inputShape.size()));
if constexpr (std::is_same_v<I, OrtDataType::Float16_t>) {
inputTensor.emplace_back(Ort::Value::CreateTensor<Ort::Float16_t>(pImplOrt->memoryInfo, reinterpret_cast<Ort::Float16_t*>(i.data()), i.size(), inputShape.data(), inputShape.size()));
} else {
inputTensor.emplace_back(Ort::Value::CreateTensor<I>(pImplOrt->memoryInfo, i.data(), i.size(), inputShape.data(), inputShape.size()));
}
}
// input.clear();
auto outputTensors = (pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), inputTensor.data(), inputTensor.size(), outputNamesChar.data(), outputNamesChar.size());
OrtDataType::Float16_t* outputValues = reinterpret_cast<OrtDataType::Float16_t*>(outputTensors[0].template GetTensorMutableData<Ort::Float16_t>());
std::vector<OrtDataType::Float16_t> outputValuesVec{outputValues, outputValues + inputTensor.size() / mInputShapes[0][1] * mOutputShapes[0][1]};
O* outputValues = reinterpret_cast<O*>(outputTensors[0].template GetTensorMutableData<O>());
std::vector<O> outputValuesVec{outputValues, outputValues + inputTensor.size() / mInputShapes[0][1] * mOutputShapes[0][1]};
outputTensors.clear();
return outputValuesVec;
}

} // namespace ml

} // namespace o2
} // namespace o2
7 changes: 7 additions & 0 deletions GPU/GPUTracking/Base/GPUConstantMem.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@
#include "GPUKernelDebugOutput.h"
#endif

#ifdef GPUCA_HAS_ONNX
#include "GPUTPCNNClusterizer.h"
#endif

namespace o2::gpu
{
struct GPUConstantMem {
Expand All @@ -55,6 +59,9 @@ struct GPUConstantMem {
#ifdef GPUCA_KERNEL_DEBUGGER_OUTPUT
GPUKernelDebugOutput debugOutput;
#endif
#ifdef GPUCA_HAS_ONNX
GPUTPCNNClusterizer tpcNNClusterer[GPUCA_NSECTORS];
#endif

template <int32_t I>
GPUd() auto& getTRDTracker();
Expand Down
1 change: 1 addition & 0 deletions GPU/GPUTracking/Base/GPUMemoryResource.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ struct GPUMemoryReuse {
};
enum Group : uint16_t {
ClustererScratch,
NNClusterer,
ClustererZS,
TrackerScratch,
TrackerDataLinks,
Expand Down
3 changes: 3 additions & 0 deletions GPU/GPUTracking/Base/GPUReconstruction.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ GPUReconstruction::GPUReconstruction(const GPUSettingsDeviceBackend& cfg) : mHos
for (uint32_t i = 0; i < NSECTORS; i++) {
processors()->tpcTrackers[i].SetSector(i); // TODO: Move to a better place
processors()->tpcClusterer[i].mISector = i;
#ifdef GPUCA_HAS_ONNX
processors()->tpcNNClusterer[i].mISector = i;
#endif
}
#ifndef GPUCA_NO_ROOT
mROOTDump = GPUROOTDumpCore::getAndCreate();
Expand Down
12 changes: 9 additions & 3 deletions GPU/GPUTracking/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -160,8 +160,8 @@ set(HDRS_INSTALL
)

set(SRCS_NO_CINT ${SRCS_NO_CINT} display/GPUDisplayInterface.cxx)
set(SRCS_NO_CINT
${SRCS_NO_CINT}

set(SRCS_NO_CINT ${SRCS_NO_CINT}
Global/GPUChainITS.cxx
ITS/GPUITSFitter.cxx
ITS/GPUITSFitterKernels.cxx
Expand Down Expand Up @@ -192,6 +192,10 @@ set(SRCS_NO_CINT
Refit/GPUTrackingRefitKernel.cxx
Merger/GPUTPCGMO2Output.cxx)

if(NOT ALIGPU_BUILD_TYPE STREQUAL "Standalone")
list(APPEND SRCS_NO_CINT TPCClusterFinder/GPUTPCNNClusterizerKernels.cxx TPCClusterFinder/GPUTPCNNClusterizer.cxx TPCClusterFinder/GPUTPCNNClusterizerHost.cxx)
endif()

set(SRCS_DATATYPES
${SRCS_DATATYPES}
DataTypes/TPCPadGainCalib.cxx
Expand Down Expand Up @@ -271,9 +275,11 @@ if(ALIGPU_BUILD_TYPE STREQUAL "O2")
O2::GPUCommon
O2::ReconstructionDataFormats
O2::TPCFastTransformation
O2::ML
PRIVATE_LINK_LIBRARIES O2::DataFormatsTPC
SOURCES ${SRCS_DATATYPES})
target_compile_definitions(${targetName} PRIVATE GPUCA_O2_LIB GPUCA_TPC_GEOMETRY_O2)
target_compile_definitions(${targetName} PRIVATE GPUCA_O2_LIB GPUCA_TPC_GEOMETRY_O2 GPUCA_HAS_ONNX=1)

o2_target_root_dictionary(GPUDataTypes
HEADERS ${HDRS_CINT_DATATYPES} ${HDRS_CINT_O2_ADDITIONAL}
LINKDEF GPUTrackingLinkDef_O2_DataTypes.h)
Expand Down
13 changes: 13 additions & 0 deletions GPU/GPUTracking/Definitions/GPUDefGPUParameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,9 @@
#ifndef GPUCA_LB_GPUTPCCFClusterizer
#define GPUCA_LB_GPUTPCCFClusterizer 512
#endif
#ifndef GPUCA_LB_GPUTPCNNClusterizerKernels
#define GPUCA_LB_GPUTPCNNClusterizerKernels 512
#endif
#ifndef GPUCA_LB_GPUTrackingRefitKernel_mode0asGPU
#define GPUCA_LB_GPUTrackingRefitKernel_mode0asGPU 256
#endif
Expand All @@ -507,6 +510,16 @@

#define GPUCA_LB_GPUTPCCFNoiseSuppression_noiseSuppression GPUCA_LB_GPUTPCCFNoiseSuppression
#define GPUCA_LB_GPUTPCCFNoiseSuppression_updatePeaks GPUCA_LB_GPUTPCCFNoiseSuppression

#ifdef GPUCA_HAS_ONNX
#define GPUCA_LB_GPUTPCNNClusterizerKernels_runCfClusterizer GPUCA_LB_GPUTPCNNClusterizerKernels
#define GPUCA_LB_GPUTPCNNClusterizerKernels_fillInputNN GPUCA_LB_GPUTPCNNClusterizerKernels
#define GPUCA_LB_GPUTPCNNClusterizerKernels_determineClass1Labels GPUCA_LB_GPUTPCNNClusterizerKernels
#define GPUCA_LB_GPUTPCNNClusterizerKernels_determineClass2Labels GPUCA_LB_GPUTPCNNClusterizerKernels
#define GPUCA_LB_GPUTPCNNClusterizerKernels_publishClass1Regression GPUCA_LB_GPUTPCNNClusterizerKernels
#define GPUCA_LB_GPUTPCNNClusterizerKernels_publishClass2Regression GPUCA_LB_GPUTPCNNClusterizerKernels
#endif

#define GPUCA_LB_GPUTPCCFStreamCompaction_scanStart GPUCA_THREAD_COUNT_SCAN
#define GPUCA_LB_GPUTPCCFStreamCompaction_scanUp GPUCA_THREAD_COUNT_SCAN
#define GPUCA_LB_GPUTPCCFStreamCompaction_scanTop GPUCA_THREAD_COUNT_SCAN
Expand Down
Loading