GitHub - ml-cube/ml3-drift: Easy-to-embed Drift Detectors

Your Drift Detection Toolbox

ml3-drift is an open source AI library that provides seamless integration of drift detection algorithms and techniques into Data Science Workflows. It does so by providing 3 main modules:

🤖 Monitoring Algorithms: a collection of univariate and multivariate drift detection algorithms, both for batch and online settings. Some of them are implemented from scratch, while others are wrappers around existing libraries.
🧩 Framework Integrations: these components allow the integration of drift detection algorithms into existing Machine Learning and AI frameworks, such as scikit-learn and transformers (huggingface). This enables developers to easily add drift detection to their existing pipelines with minimal code changes.
📊 Distribution Analyzers: a set of tools for analyzing the distribution of data and detecting drifts in a given dataset.

✅ Features

Monitoring Algorithms

Monitoring algorithms are the building blocks of the library. They expose a common interface (fit - detect) that makes them easy to use and swappable in different contexts.

While the other modules leverage these algorithms, they can also be used in a standalone fashion. This is particularly useful when you want to experiment with different algorithms or use them in a custom way.

This table summarizes the algorithms we currently support, along with their characteristics. We will add more in the future, let us know if you are interested in a specific algorithm!

Algorithm	Mode	Data Type	Data Dimensionality	Notes
Kolmogorov-Smirnov (KS) Test	Batch	Continuous	Univariate	Uses scipy `ks_2samp` method
Chi-Square	Batch	Categorical	Univariate	Uses scipy `chi2_contingency` method
Bonferroni Correction Algorithm	Batch	Continuous / Categorical	Multivariate	Wraps a univariate monitoring algorithm and applies Bonferroni correction to execute multivariate detection.
ADWIN	Online	Continuous	Univariate	Wraps River library class
KSWIN	Online	Continuous	Univariate	Wraps River library class
PageHinkley	Online	Continuous	Univariate	Wraps River library class

Example

Here is a simple example of how to use the KSDriftDetector algorithm in a standalone fashion:

import numpy as np
from ml3_drift.monitoring.algorithms.batch.ks import KSAlgorithm
from ml3_drift.callbacks.base import print_callback

# Create some sample data
np.random.seed(42)
reference_data = np.random.normal(0, 1, 1000)
current_data = np.random.normal(0.5, 1, 1000)

# Initialize the drift detector. It's possible to pass a list of callbacks
# that will be executed when a drift is detected.
drift_detector = KSAlgorithm(callbacks=[print_callback])
# Fit the detector on the reference data
drift_detector.fit(reference_data)
# Detect drift on the current data
drift = drift_detector.detect(current_data)

The example callback we provided will simply print a message when a drift is detected. For instance:

Drift Detected, drift info: {'test_statistic': 4.2252283893369713e-26, 'statistic_threshold': 0.005}

Framework Integrations

These are the frameworks we currently support. We will add more in the future, let us know if you are interested in a specific framework!

Framework	How	Example
`scikit-learn`	Provides a scikit-learn compatible drift detector that integrates easily into existing scikit-learn pipelines.	Mixed data monitoring
`transformers` (by `huggingface`)	A minimal wrapper for the Pipeline object that looks like a Pipeline, behaves like a Pipeline but also monitors the output of the wrapped Pipeline. Works with any feature extraction pipeline, both images and text.	Text data monitoring

Example

ml3-drift framework components are designed to be easily integrated into your existing code. You should be able to use them with minimal changes to your code.

Here is a simple example with scikit-learn:

from ml3_drift.monitoring.algorithms.batch.ks import KSAlgorithm
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from ml3_drift.callbacks.base import print_callback

from ml3_drift.sklearn.base import SklearnDriftDetector

# Define your pipeline as usual, but also add a drift detector.
# The detector accepts a list of functions to be called when a drift is detected.
# The first argument of the function is a dataclass containing some information
# about the drift (check it out in ml3_drift/callbacks/models.py).
drift_detector = KSAlgorithm(callbacks=[print_callback])

pipeline = Pipeline(
    steps=[
        ("preprocessor", StandardScaler()),
        # Wrap the drift detector in the SklearnDriftDetector
        # to make it compatible with sklearn pipelines.
        ("monitoring", SklearnDriftDetector(drift_detector)),
        (
            "model",
            DecisionTreeRegressor(),
        ),
    ]
)

# When fitting the pipeline, the drift detector will
# save the training data as reference data.
# No effect on the model training.
pipeline = pipeline.fit(X_train, y_train)

# When making predictions, the drift detector will
# check if the incoming data is similar to the reference data
# and execute the callback you specified if a drift is detected.
predictions = pipeline.predict(X_test)

The example callback we provided will simply print a message when a drift is detected. For instance:

Drift Detected, drift info: {'test_statistic': 3.35710076793659e-13, 'statistic_threshold': 0.005}

You can find other examples in the examples folder. For more information, please refer to the documentation.

Distribution Analyzers

This module provides tools for identifying distribution shifts within a given dataset. This helps understanding the data and highlighting potential issues that might arise when using it to train a model.

We currently provide 2 analyzers: one is batch-based, the other is online-based. They work in a similar way but have a slightly different approach. They both accept a couple of monitoring algorithms, one for continuous features and one for categorical features, but:

the Batch analyzer 📦 splits the dataset into subsets of a given size and compares each subset with the others in order to identify macro-batches of data belonging to different distributions. It is also able to identify recurring distributions, i.e. a distribution that appears multiple times (but not in a contiguous way) in the dataset.
the Online analyzer 🌊 processes the dataset in a sequential way, creating sliding windows of contiguous samples. If a drift is detected, the dataset is "split" and the algorithms reset on the new data. The output is a more granular view of the distribution changes over time. However, it is not able to identify recurring distributions. Notice this class hasn't been tested yet, use it at your own risk :).

Example

This is a simple example of how to use the Batch analyzer:

from ml3_drift.analysis.analyzer.batch import BatchDataDriftAnalyzer
from ml3_drift.monitoring.algorithms.batch.bonferroni import (
    BonferroniCorrectionAlgorithm,
)
from ml3_drift.monitoring.algorithms.batch.chi_square import ChiSquareAlgorithm
from ml3_drift.monitoring.algorithms.batch.ks import KSAlgorithm


continuous_algo = BonferroniCorrectionAlgorithm(algorithm=KSAlgorithm(p_value=0.05))
categorical_algo = BonferroniCorrectionAlgorithm(
    algorithm=ChiSquareAlgorithm(p_value=0.05)
)

analyzer = BatchDataDriftAnalyzer(
    continuous_monitoring_algorithm=continuous_algo,
    categorical_monitoring_algorithm=categorical_algo,
    batch_size=50,
)

X = ...
y = ...

# Run analyze
report = analyzer.analyze(
    X,
    y=y,
    continuous_columns=[0, 1],
    categorical_columns=[2, 3],
    y_categorical=False,
)

print(report)

The output is a report containing the indexes of the samples partitioning the datasets into different distributions, along with indication of recurring distributions. For instance:

# Indicates that the dataset is composed by 3 different distributions.
# Also, the first and the third distributions are similar.
Report(concepts=[(0, 299), (300, 599), (600, 899)], same_distributions={0: [2]})

📦 Installation

ml3-drift is available on PyPI and supports Python versions from 3.10 to 3.13, included.

All integrations with external libraries are managed through extra dependencies. The plain ml3-drift package comes without any dependency, which means that you need to specify the framework you want to use when installing the package. Otherwise, if you are just experimenting, you can install the package with all the available extras.

You can use pip:

pip install ml3-drift[all] # install all the dependencies
pip install ml3-drift[sklearn] # install only sklearn dependency
pip install ml3-drift[huggingface] # install huggingface dependency

or uv

uv add ml3-drift --all-extras # install all the dependencies
uv add ml3-drift --extra sklearn # install only sklearn dependency
uv add ml3-drift --extra huggingface # install only huggingface dependency

❓ What is drift detection? Why do we need it?

Machine Learning algorithms rely on the assumption that the data used during training comes from the same distribution as the data seen in production.

However, this assumption rarely holds true in the real world, where conditions are dynamic and constantly evolving. These distributional changes, if not addressed properly, can lead to a decline in model performance. This, in turn, can result in inaccurate predictions or estimations, potentially harming the business.

Drift Detection, often referred to as Monitoring, is the process of continuously tracking the performance of a model and the distribution of the data it is operating on. The objective is to quickly detect any changes in data distribution or behavior, so that corrective actions can be taken in a timely manner.

😅 Yet another drift detection library?

Not really. There are many great open source drift detection libraries out there (nannyml, river, evidently just to name a few), our goal is a bit different. While we also offer some algorithms implementations (even though some of them are just wrappers around these libraries), we mainly focus on the integration of drift detection practices into existing ML and AI workflows. During our experience in the field, we observed a lack of standardization in the API and misalignments with common ML interfaces. We want to fill this gap by providing a library that is easy to use, flexible and that can be easily integrated into existing and working systems.

Hopefully, this won't be the 15th competing standard 😉.

🚀 Contributing

We welcome contributions to ml3-drift! Since we are at a very early stage, we are looking forward to feedbacks, ideas and bug reports. Feel free to open an issue if you have any questions or suggestions.

Local Development

These are the steps you need to follow to set up your local development environment.

We use uv as package manager and just as command runner. Once you have both installed, you can clone the repository and run the following command to set up your development environment:

just dev-sync

The previous command will install all optional dependencies. If you want to install only one of them, run:

just dev-sync-extra extra-to-install
# for instance, just dev-sync-extra sklearn

Make sure you install the pre-commit hooks by running:

just install-hooks

To format your code, lint it and run tests, you can use the following command:

just validate

Notice that tests are run according to the installed libraries. For instance, if you don't have scikit-learn installed, all tests which requires it will be skipped.

📜 License

This project is licensed under the terms of the Apache License Version 2.0. For more details, please refer to the LICENSE file. All contributions to this project will be distributed under the same license.

👥 Authors

This project was originally developed at ML cube and has been open-sourced to benefit the ML community, from which we deeply welcome contributions.

While ml3-drift provides easy to use and integrated drift detection algorithms, companies requiring enterprise-grade monitoring, advanced analytics and insights capabilities might be interested in trying out our product, the ML cube Platform.

The ML cube Platform (website, docs) is a comprehensive end-to-end ModelOps framework that helps you trust your AI models and GenAI applications by providing several functionalities, such as data and model monitoring, drift root cause analysis, performance-safe model retraining and LLM security. It can both be used during the development phase of your models and in production, to ensure that your models are performing as expected and quickly detect and understand any issues that may arise.

If you'd like to learn more about our product or wonder how we can help you with your AI projects, visit our websites or contact us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
.vscode		.vscode
docs		docs
examples		examples
images		images
md-docs		md-docs
src/ml3_drift		src/ml3_drift
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Your Drift Detection Toolbox

✅ Features

Monitoring Algorithms

Example

Framework Integrations

Example

Distribution Analyzers

Example

📦 Installation

❓ What is drift detection? Why do we need it?

😅 Yet another drift detection library?

🚀 Contributing

Local Development

📜 License

👥 Authors

About

Uh oh!

Releases 2

Uh oh!

Contributors 3

Uh oh!

Languages

License

ml-cube/ml3-drift

Folders and files

Latest commit

History

Repository files navigation

Your Drift Detection Toolbox

✅ Features

Monitoring Algorithms

Example

Framework Integrations

Example

Distribution Analyzers

Example

📦 Installation

❓ What is drift detection? Why do we need it?

😅 Yet another drift detection library?

🚀 Contributing

Local Development

📜 License

👥 Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors 3

Uh oh!

Languages