[Enhancement] Add float datatype support to rocAucScore #3074

icfaust · 2025-02-16T15:20:11Z

Description

The algorithm rocAucScore is currently forced to do all computation in doubles. This does not follow the coding standards of the codebase where it should template both the cpu ISA as well as the datatype (float or double). In order to properly interface to this code outside of DAAL (i.e. oneDAL), it must first maintain a float version as well. While introducing a datatype template, a default value of double is set in order to guarantee compatibility to daal4py. These changes do not have any impact on the performance or operation of daal4py or any code which currently interfaces rocAucScore, therefore performance benchmarking is unnecessary.

This PR is a precursor to enabling a oneDAL algorithm of rocAucScore. This completes issue #2740

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

icfaust · 2025-02-17T05:37:32Z

/intelci: run

Alexsandruss

Not backward-compatible:

miniconda3/envs/build/lib/python3.11/site-packages/daal4py/__init__.py:54: in <module>
    from daal4py._daal4py import *
E   ImportError: /.../miniconda3/envs/build/lib/python3.11/site-packages/daal4py/_daal4py.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN4daal15data_management8internal11rocAucScoreERKNS_8services10interface19SharedPtrINS0_10interface112NumericTableEEES9_

icfaust · 2025-02-17T13:26:45Z

This PR will be stopped until CI can track ABI changes, as this requirement is undocumented and not written in the oneDAL development rules, nor the checklist for PRs.

david-cortes-intel · 2025-02-26T11:43:24Z

@icfaust I haven't checked the code in detail, but doesn't the fastest way of calculating this metric involve computations of numbers up to "nrows^2" in a loop where they are added/subtracted to similar numbers? The fp32 data type only has integer-level precision up to 2^23 which is easy to exceed, after which the precision of the calculation might degrade.

commit 35e0b62 Author: Faust, Ian <[email protected]> Date: Mon Mar 10 14:56:52 2025 +0100 renew commit 1c9299c Author: Ian Faust <[email protected]> Date: Mon Mar 10 07:24:21 2025 +0100 Update ci.yml commit c6a7b4a Author: Ian Faust <[email protected]> Date: Mon Mar 10 06:44:43 2025 +0100 Update ci.yml commit 18dac73 Author: Ian Faust <[email protected]> Date: Thu Mar 6 01:28:24 2025 +0100 Update ci.yml commit d84c663 Author: Ian Faust <[email protected]> Date: Thu Mar 6 00:53:40 2025 +0100 Update ci.yml commit 91a523d Author: Ian Faust <[email protected]> Date: Thu Mar 6 00:05:02 2025 +0100 Update ci.yml commit a1048df Author: Ian Faust <[email protected]> Date: Wed Mar 5 23:54:30 2025 +0100 Update ci.yml commit b6a84b8 Author: Ian Faust <[email protected]> Date: Wed Mar 5 23:18:26 2025 +0100 Update ci.yml commit d496c2f Author: Ian Faust <[email protected]> Date: Wed Mar 5 22:18:02 2025 +0100 Update ci.yml commit 272fdfe Author: Ian Faust <[email protected]> Date: Wed Mar 5 19:15:04 2025 +0100 Update ci.yml commit 05ee213 Author: Ian Faust <[email protected]> Date: Wed Mar 5 18:35:28 2025 +0100 Update ci.yml commit fd8bf73 Author: Faust, Ian <[email protected]> Date: Wed Mar 5 18:17:24 2025 +0100 remove false comment commit 71c19df Author: Faust, Ian <[email protected]> Date: Wed Mar 5 18:10:59 2025 +0100 first try

icfaust · 2025-03-24T19:58:15Z

@icfaust I haven't checked the code in detail, but doesn't the fastest way of calculating this metric involve computations of numbers up to "nrows^2" in a loop where they are added/subtracted to similar numbers? The fp32 data type only has integer-level precision up to 2^23 which is easy to exceed, after which the precision of the calculation might degrade.

Seems logically sound to me, kept the core computation in double, but now will still interface with float data.

icfaust · 2025-03-30T20:02:02Z

/intelci: run

napetrov · 2025-03-31T18:22:55Z

cpp/daal/src/data_management/roc_auc_score.cpp

+template DAAL_EXPORT double rocAucScore<float>(const NumericTablePtr & truePrediction, const NumericTablePtr & testPrediction);
+template DAAL_EXPORT double rocAucScore<double>(const NumericTablePtr & truePrediction, const NumericTablePtr & testPrediction);
+
+// necessary for maintaining ABI


Should we mark this so it would be easy to identify and remove in 2026.x release?

icfaust added 2 commits February 16, 2025 16:17

Update roc_auc_score.cpp

49d0c89

Update roc_auc_score.cpp

d6832ec

icfaust requested review from Alexsandruss, samir-nasibli and Alexandr-Solovev as code owners February 16, 2025 15:20

icfaust added 4 commits February 16, 2025 17:21

Update roc_auc_score.cpp

6b8d263

Update roc_auc_score.cpp

545aa74

Update roc_auc_score.h

a953a76

Update roc_auc_score.cpp

5522e57

icfaust added the enhancement label Feb 16, 2025

icfaust added 2 commits February 17, 2025 00:13

Update roc_auc_score.cpp

b95d450

clang-formatting and comment

c89cd33

icfaust changed the title ~~[Enhancement] Add float datatype support to Roc_auc_score~~ [Enhancement] Add float datatype support to rocAucScore Feb 17, 2025

Alexandr-Solovev approved these changes Feb 17, 2025

View reviewed changes

Alexsandruss requested changes Feb 17, 2025

View reviewed changes

icfaust marked this pull request as draft February 17, 2025 13:25

Merge branch 'uxlfoundation:main' into roc_auc_float

c2ee01e

dominik-mich mentioned this pull request Mar 4, 2025

Add single precision support for roc_auc_score #2740

Open

icfaust and others added 5 commits March 18, 2025 22:18

chmod

036daf6

Merge branch 'main' into roc_auc_float

a1f7dea

Update abi_check.sh

0f96b5c

Update makefile

d41c5f3

icfaust mentioned this pull request Mar 19, 2025

[CI, enhancement] enforce ABI checking of linux DPCPP build #3112

Merged

13 tasks

icfaust added 3 commits March 19, 2025 16:43

Merge branch 'uxlfoundation:main' into roc_auc_float

5da0d71

Update roc_auc_score.cpp

6c5d956

Update roc_auc_score.h

20a8b39

icfaust added 3 commits March 23, 2025 17:06

Update roc_auc_score.h

9b26413

Update roc_auc_score.cpp

719e5fb

Merge branch 'main' into roc_auc_float

8112579

icfaust marked this pull request as ready for review March 24, 2025 11:30

icfaust requested review from napetrov, homksei, ahuber21 and ethanglaser as code owners March 24, 2025 11:30

icfaust added 4 commits March 24, 2025 12:31

Merge branch 'main' into roc_auc_float

e1c4b09

add ABI fix

8b662c1

Update abi_check.sh

514089e

Merge branch 'main' into roc_auc_float

32be362

icfaust requested a review from Alexsandruss March 26, 2025 12:31

Alexsandruss approved these changes Mar 31, 2025

View reviewed changes

david-cortes-intel approved these changes Mar 31, 2025

View reviewed changes

icfaust merged commit a00ac10 into uxlfoundation:main Mar 31, 2025
18 of 19 checks passed

icfaust deleted the roc_auc_float branch March 31, 2025 17:49

napetrov reviewed Mar 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add float datatype support to rocAucScore #3074

[Enhancement] Add float datatype support to rocAucScore #3074

icfaust commented Feb 16, 2025 •

edited

Loading

icfaust commented Feb 17, 2025

Alexsandruss left a comment

icfaust commented Feb 17, 2025

david-cortes-intel commented Feb 26, 2025

icfaust commented Mar 24, 2025

icfaust commented Mar 30, 2025

napetrov Mar 31, 2025

[Enhancement] Add float datatype support to rocAucScore #3074

[Enhancement] Add float datatype support to rocAucScore #3074

Conversation

icfaust commented Feb 16, 2025 • edited Loading

Description

icfaust commented Feb 17, 2025

Alexsandruss left a comment

Choose a reason for hiding this comment

icfaust commented Feb 17, 2025

david-cortes-intel commented Feb 26, 2025

icfaust commented Mar 24, 2025

icfaust commented Mar 30, 2025

napetrov Mar 31, 2025

Choose a reason for hiding this comment

icfaust commented Feb 16, 2025 •

edited

Loading