-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]Use n_jobs parameter in KNeighborsTimeSeriesClassifier. #2687
base: main
Are you sure you want to change the base?
Conversation
Thank you for contributing to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, could you verify output is the same from before and show that the mutlithreading does make the classifier faster.
preds[i] = preds[i] / np.sum(preds[i]) | ||
|
||
return preds | ||
self._check_is_fitted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also remove it from 158?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah did not notice it was already there! Yes it should be redundant, this is handled in the base class.
In #2478, I've put some benchmarks. These are the entries on three data sets with 1 or 8 cores with only Aeon's builtin DTW:
For the small GunPoint data set, overhead is larger than gains from parallelization. I expect that for ED, data sets would need to be larger to get a performance increase (if that's even possible) compared to DTW. Equivalence is tested in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Could you edit the n_jobs
docstring and consider implementing the backend parameter found in other threaded classifiers. Don't have to do the second but would be a nice addition I think.
preds[i] = preds[i] / np.sum(preds[i]) | ||
|
||
return preds | ||
preds = Parallel(n_jobs=self.n_jobs, backend=self.parallel_backend)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use check_n_jobs for both functions
Reference Issues/PRs
Fixes #2478. See also #2545 and #2578.
What does this implement/fix? Explain your changes.
Uses n_jobs parameter in
_predict
andpredict_proba
ofKNeighborsTimeSeriesClassifier
. Parallelization is done in these methods instead of_kneighbors
to potentially allow speedup through upper bounding the distance.PR checklist
For all contributions