Improve performance of color distance calculations by kernel fusion #809
+254
−119
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This MR is purely to fuse many separate kernels into a single elementwise kernel for each of the "color distance" functions. This is expected to result in substantial performance improvement by reduced kernel launch overhead and because a single pass through the image is much more memory efficient than many separate kernel calls.
There is not expected to be any change in behavior (existing tests must continue to pass)
Benchmark Results
I added benchmarks for these functions and compared the results before and after this change.
The acceleration values are the relative speedup as compared to the scikit-image implementation
For a pair of 32-bit LAB images of shape: (512, 512, 3)
For a pair of 32-bit LAB images of shape: (3840, 2160, 3)