-
Notifications
You must be signed in to change notification settings - Fork 215
[ENH] Repository benchmarking #3026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for contributing to
|
I want to add a web page tutorial similar to: https://docs.scipy.org/doc/scipy/dev/contributor/benchmarking.html |
This is a further example of how we could use this with our estimators. Here is a benchmark for Kmeans: class KMeansBenchmark(EstimatorBenchmark):
ks = [2, 4, 8]
inits = ["random", "kmeans++"]
distances = ["euclidean", "dtw"]
average_methods = ["mean", "ba"]
# extend the base grid
params = EstimatorBenchmark.params + [ks, inits, distances, average_methods]
param_names = EstimatorBenchmark.param_names + ["k", "init", "distance", "average_method"]
def _build_estimator(self, k, init, distance, average_method) -> BaseEstimator:
return aeon_clust.TimeSeriesKMeans(
n_clusters=k,
init=init,
distance=distance,
averaging_method=average_method,
n_init=1,
random_state=1
) I have defined a base estimator benchmark class that just generates data and fit predict timing methods. class EstimatorBenchmark(Benchmark, ABC):
# Base grid (shared across all estimators)
shapes = [
(10, 1, 10),
(100, 1, 100),
(10, 3, 10),
(100, 3, 100),
]
# Subclasses will append their own grids to these:
params = [shapes]
param_names = ["shape"]
def setup(self, shape, *est_params):
# Data
self.X_train = make_example_3d_numpy(*shape, return_y=False, random_state=1)
self.X_test = make_example_3d_numpy(*shape, return_y=False, random_state=2)
# Pre-fit once for predict timing
self.prefit_estimator = self._build_estimator(*est_params)
self.prefit_estimator.fit(self.X_train)
def time_fit(self, shape, *est_params):
est = self._build_estimator(*est_params) # fresh each run
est.fit(self.X_train)
def time_predict(self, shape, *est_params):
self.prefit_estimator.predict(self.X_test)
@abstractmethod
def _build_estimator(self, *est_params) -> BaseEstimator:
"""Return an unfitted estimator configured with the given params."""
... For each time series of each shape a kmeans estimator is ran for 3 different values of k, two init, 2 distances, and 2 average methods. For fit this produces a table that looks like:
|
I like the look of it, I dont suppose it also profiles? I would very much like to be able to profile algorithms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this ❤️
benchmarks/benchmarks/clustering.py
Outdated
ks = [2, 4, 8] | ||
inits = ["random", "kmeans++"] | ||
distances = tuple(DISTANCES_DICT.keys()) # all supported distances | ||
distances = ["euclidean"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, you want to remove the distances overwrite before the merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had some thoughts since writing this benchmark about how we should do models that could have a lot of different parameters. I'll write a new comment below this outlining my thoughts after writing a few benchmarks.
otherwise. Some of the benchmarking features in `spin` also tell ASV to use the aeon | ||
compiled by `spin`. To run the benchmarks, you will need to install the "dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is spin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I need to remove this from the docs thats something scipy uses and I decided we probably don't need it.
Just going to put this here for future reference on how ASV runs benchmarks: General ASV settings / rules:
|
I also wanted to define a small set of initial guidelines for how we should conduct benchmarking in Aeon. What should we benchmark?Any public-facing function where monitoring performance would be valuable. Examples:
It may also be useful to benchmark functions that are used frequently internally. Examples:
How should we benchmarkWhen designing benchmarks, be mindful of runtime. For example, for We should also consider the input data shapes.
This range does not need to be huge. For example, for estimators and distances, I currently use the following shapes: (10, 1, 10),
(10, 1, 1000),
(50, 1, 100),
(10, 3, 10),
(10, 3, 1000),
(50, 3, 100), Due to how ASV runs benchmarks, even with relatively modest changes in the number of cases (e.g., up to a few thousand time series), this range is sufficient to detect performance changes across different dataset sizes. I am currently only testing with NumPy format. Other formats are fine to include if you believe they could show different performance characteristics. |
Overall, I agree with what you said above. My five cents:
|
Reference Issues/PRs
What does this implement/fix?
This PR adds performance benchmarking capabilities (runtime-focused) to the repository, enabling us to measure improvements or regressions over time and easily compare performance between branches. This allows PRs to be accompanied by clear, reproducible performance data and visualisations.
Benchmarking is implemented using airspeed-velocity (asv), the same tool used by SciPy (docs). Our configuration closely follows SciPy’s, with some of their utility methods adapted for aeon.
An example benchmark has been added for the
distance
module. Example output tables and graphs are included in the comments below.New dependency
Usage example
Once merged into
main
, you can define benchmark classes to measure performance across shapes, datasets, or algorithms. For example, here’s the Euclidean distance benchmark which exists at benchmarks/benchmarks/distance.py:Running benchmarks
From the
benchmarks
directory:Comparing branches
After running benchmarks on main
Example output:
Visualisation
You can generate interactive performance graphs:
This launches a local web server with plots and historical trends.
Examples of some pages and graphs include:
and
For a more detailed example see the example html report asv links: https://pv.github.io/numpy-bench/
The web interface provides:
Benefits
Future Work