Skip to content

Approximate PR/ROC AUC Metics #2319

@vfdev-5

Description

@vfdev-5

🚀 Feature

Taken from this colab by @EricZimmermann

Description

Problem

For large tensors, computing AUC metrics over multiple thresholds is exhaustive and slow. For a sufficiently large dataset, caching or saving outputs is too expensive and must be done in post.

Solution

Assuming the distribution of values are known and bounded, approximate the an integral via riemann sum over a set of fixed or variable step sizes.

Let a set of monotonically increasing thresholds $T ={t_1, t_2, \dots, t_n}$ be the step sizes used to approximate an integral (AUC). For each iteration, cache the counts (build a histogram) of each value falling into a bin between two thresholds. When complete, approximate a statistic using a ratio of counts.

Place calc on GPU for speed removing cpu bottleneck at each iteration

Applications

Optimizing a model for voxel level (each voxel treated as an independant sample) PR AUC / ROC AUC, ex: semantic pathology segmentation

Enable user to validate model based on best operating point setting (F1 for non 0.5 threshold)

Limitations

Domain (as thresholds) must be known ahead of time. This can be accounted for by setting a lot of thresholds over a very wide range where num thresholds << num voxels in output tensor

Future work can use a heuristic to widen / narrow num thresholds based on previous iterations

Context:

Code:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions