Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPC QC: Add skeleton for GPUErrorQA task #13964

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

ariedel-cern
Copy link

Add skeleton for GPUErrorQA task in O2.

In this draft, only the number of errors for a given error code is counted and put into a 1D histogram.

In principle the error is reporting 4 values, the error code +3 parameters (like SectorRow, Sector, Value, Max ...). However, the meaning of the parameters depends on the error code itself (defined in GPUErrorCodes.h). There was a suggestion in the past to display the errors in a 2D histogram.

In principle, we could implement 3 2D histograms, putting the error code on the x-axis and the different parameter values on the y-axis.

Tagging @wiechula and @davidrohr for their opinion.

Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1

@davidrohr
Copy link
Collaborator

I think a 2D histogram might make sense, at least as an overview, since you would see al error types.
Perhaps even a 1D, just with the counts for the error codes could give a good first overview?
Ideally, could we put the error code names as ticks on the x-axis.

Then we might want additional 3D histograms on top, for single error codes, with multiple parameters?
I guess adding the histograms is simple? We could also just start with something and then extend?

And to be honest, I am not so familiar with other QC tasks, so I don't know what is the normal strategy.

@ariedel-cern ariedel-cern marked this pull request as ready for review March 6, 2025 11:11
@ariedel-cern
Copy link
Author

Thanks for the comments, David.
For now I put only 1 histogram which counts the occurring errors. Then I can also finish up the implementation in QC itself.
I think memory consumption is always a bit of a concern for QC. If you can tell which errors are of special importance we can discuss if we can have dedicated higher dimensional histograms for these.

Comment on lines +30 to +35
namespace o2
{
namespace tpc
{
namespace qc
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace o2
{
namespace tpc
{
namespace qc
{
namespace o2::tpc::qc
{

Comment on lines +69 to +71
} // namespace qc
} // namespace tpc
} // namespace o2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} // namespace qc
} // namespace tpc
} // namespace o2
} // namespace o2::tpc::qc

Comment on lines +24 to +28
// root includes
#include "TH1.h"

// o2 includes
// #include "DataFormatsTPC/Defs.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use forward declaration for TH1

Suggested change
// root includes
#include "TH1.h"
// o2 includes
// #include "DataFormatsTPC/Defs.h"
class TH1;

void resetHistograms();

/// return histograms
std::unordered_map<std::string, std::unique_ptr<TH1>>& getMapHist() { return mMapHist; };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the histograms don't need to be filled outside, this can be const

Suggested change
std::unordered_map<std::string, std::unique_ptr<TH1>>& getMapHist() { return mMapHist; };
std::unordered_map<std::string, std::unique_ptr<TH1>>& getMapHist() const { return mMapHist; };

#ifndef AliceO2_TPC_QC_GPUERRORQA_H
#define AliceO2_TPC_QC_GPUERRORQA_H

#include <memory>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <memory>
#include <string>
#include <memory>


// root includes
#include "TFile.h"
#include <TH1.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <TH1.h>
#include "TH1.h"

Comment on lines +15 to +16
#include <memory>
#include <unordered_map>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already defined in the header

Suggested change
#include <memory>
#include <unordered_map>

};

// 1D histogram counting all reported errors
mMapHist["ErrorCounter"] = std::make_unique<TH1F>("ErrorCounter", "ErrorCounter", errorNames.size(), 0, errorNames.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a histogram of floats? If it is a simple counter, it could also be TH1I, unless one would like to normalize at some point.

Comment on lines +73 to +78
for (const auto& [name, hist] : mMapHist) {
TObjArray arr;
arr.SetName(name.data());
arr.Add(hist.get());
arr.Write(arr.GetName(), TObject::kSingleKey);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it does not make sense to put each histogram in a separate array.

Suggested change
for (const auto& [name, hist] : mMapHist) {
TObjArray arr;
arr.SetName(name.data());
arr.Add(hist.get());
arr.Write(arr.GetName(), TObject::kSingleKey);
}
TObjArray arr;
arr.SetName("GPUError_Hists");
for (const auto& [name, hist] : mMapHist) {
arr.Add(hist.get());
}
arr.Write(arr.GetName(), TObject::kSingleKey);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants