-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative variable binning approach #849
base: master
Are you sure you want to change the base?
Conversation
Before doing a more detailed review, let me try to summarise a few key aspects of this PR:
In conclusion, I am highly in favour of the approach proposed in this PR over that in #835. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for being very late to this discussion. I think the code looks good and it is a good option for a variable binning with one "split" variable.
I understand that #835 introduces a lot of changes. While I tried to avoid affecting existing functionality as much a possible, I will fully understand if people don't feel comfortable pushing it to main and prefer adding this version instead.
In my case I specifically needed to introduce arbitrary cuts for classification, so I don't think this solution would be suitable for my analysis. However, since I might be the only person who needs this functionality for now, I would not have a problem working in separate branch.
Hi Maria, a set of n "arbitrary" cuts/selection criteria can be represented by a one-dimensional binning too, can't it? You would just have to evaluate which one of your cuts each event satisfies and add one unique number per such cut to each event in the the events file before running the pipeline, then define the one-dimensional |
Similar to #835 this PR introduces the option to use different (regular) binnings in an analysis.
Which events use which binning depends on a separate variable called cut_var. This can be for example the pid value but also the number of hit modules.
I tried to modify as little code as possible but also provide all necessary changes to use the new binning type. An example notebook is also provided. This PR introduces a new binning class
VarBinning
which basically just holds multipleMultiDimBinning
objects and oneOneDimBinning
which represents the cut variable. The main change when using theVarBinning
class is that the histogramming is not happening in the dedicated stage but in the output function of the pipeline. Consequently, a pipeline usingVarBinning
can not have a hist stage.The way a
VarBinning
is defined is by passing a list in the binning config file.