-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Unsupervised Evaluation Metrics #131
Comments
@anton164 Thanks for the comment. You highlight a fundamental challenge with anomaly detection -- often, ground truth labels are unavailable. But in my experience, the most common metrics people use to evaluate anomaly detection algorithms are the ones supported in Merlion, all of which require ground truth labels. If you have (1) specific unsupervised metrics in mind, and (2) a compelling use case for them, you are welcome to open a pull request adding them to the repo, and I can review it. But for the time being, I'm not sure how useful these unsupervised metrics would really be. |
Thanks for your prompt reply @aadyotb. I might do that to demonstrate what I mean. Which classes would you recommend me to extend for that demonstration? From a design perspective the TSADEvaluator which does historical analysis is "coupled" to ground truth label evaluation, so maybe I'll implement another version of that which isn't. From an unsupervised perspective it would be useful to have a simple way to evaluate the following metrics:
As you point out - GT labels are often unavailable, so its surprising to me that Merlion which promises to be a complete framework for TS anomaly detection does not have any guidance here. Happy to try to incorporate some ideas :) One flow I would like to support is self-supervision using Merlion:
|
Thanks for clarifying. From an implementation perspective, I'd suggest leaving For distribution statistics, one potentially interesting direction would be to characterize the amount the test scores deviate from a standard normal distribution, since calibration reshapes the distribution of training scores to look like a standard normal (note that this is more sophisticated than mean/variance normalization). So if the test scores don't seem like they've been drawn from a standard normal, this could be an indicator of distribution shift over time. I'm much more hesitant to support the self-supervised labeling approach. In practice, time series anomalies vary widely (raw spikes/dips, changes in trend, deviations from standard seasonal patterns, ...). When dealing with multivariate time series, things get even more complex. Simple models often either fail to detect these more complex anomalies, or have low precision when doing so. And in many cases, users care about detecting one type of anomaly but not another. Beyond getting actual labels (and even that can be controversial), I unfortunately don't have a great answer for this problem, and I haven't seen one in the literature either. |
Thanks for sharing your thoughts @aadyotb ! I will give it a try and report back once I have a demo in Merlion |
Is your feature request related to a problem? Please describe.
The current evaluation metrics in
evaluate/anomaly.py
assume that a ground truth available. However, in many time series anomaly detection problems there is no ground truth.It would be great if the Merlion evaluation base classes were more general and supportive of this use-case. As of now we effectively have to implement our own evaluation methods.
Describe the solution you'd like
I think ideally methods/classes such as TSADEvaluator.evaluate,
TSADScoreAccumulator
andaccumulate_tsad_score
should not assume that there is a ground truth - other interfaces in the Merlion package typically take test labels as an optional argument. Similarly, the evaluation classes should be able to compute unsupervised descriptive statistics if a ground truth is not passed.The text was updated successfully, but these errors were encountered: