-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threaded version of sideof(::Point, ::Ring)
is significantly slower than serial for small rings
#988
Comments
Really nice idea. Should we experiment with option (4) first? It is less disruptive and easy to try. I understand that we can simply do something like |
@juliohm I did try a bit with OhMyThreads but it does not seem to help much here simply with @tasks (also you can't break early easily within Polyester on the other hands seems to overall reduce allocations and bring minimal overhead for small number of points and speedup even compared to Are you willing to have the PR use Polyester or for some reason you don't want to depend on it (or use |
@disberd in that case I think it is safer to proceed with option (1). We never know for how long these optimized threading packages will be maintained (e.g., LoopVectorization.jl). |
OK, I'll do the PR with (1) but just post here for reference the timing I get with Threaded (with
|
The default multithreading for
sideof(::Point, ::Ring)
seems to pay off for rings with more than 1000 points (number depends onThreads.nthreads()
and probably system/julia version)Given that number of threads used for this can not be controlled without forcing the same number of threads for the whole julia session, I think it would be good to default to serial version for rings with 1000 points or less. This can significantly speed up operations like
point in GeometrySet
where the geometry set is composed of many small PolyAreas (like in the case of NaturalEarth borders with 110m resolution).I'd be willing to help if you are interested, alternative implementations are:
ChunkSplitter
for making sure the threaded chunks have at least i.e. 1000 elements. This still has some threading overhead but does not require as much refactoring@batch
from Polyester which has lower threading overhead and allows to specify the minimum batch sizeBenchmarks
Here are some benchmark belows between latest [email protected] and [email protected]
Threaded (v0.48.1) - Windows - Julia 1.10 - 8 threads
Threaded (v0.48.1) - Windows - Julia 1.11 - 8 threads
Threaded/Serial (v0.48.1) - Windows - Julia 1.11 - 1 thread
Serial (v0.46.1) - Windows - Julia 1.11 - 8 threads
The text was updated successfully, but these errors were encountered: