-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add Rasters.jl benchmarks #18
base: main
Are you sure you want to change the base?
Conversation
avoid the `-` splitting
This allows exactextractr to be installed in a separate environment, so it doesn't provoke a version conflict with sf and stars via gdal. This also segments each language and creates dependency trees so life is a bit easier.
conforming with all the other benchmarks
Edited the first post now that the PR is functionally complete from the Rasters.jl end. |
@asinghvi17, let's wait with this, because it uses a different dataset. Separate issue is that depending on the dataset, {fasterize} can be significantly slower than GDAL. I have another question -- In task "zonal" are exact zone statistics like in {exactextractr} used or approximate like in the other packages? |
Its approximate like the other packages. Now I've seen exactextractr is exact I'll add that option in a few months! But I think it will be even faster after that in most cases because it would be best with switching to an online stats approach (at least where that is possible for sum/prod/mean etc). Currently So for #15 it would be really nice to have (It would also be good for Another thing is there are things Rasters.jl and gdal can do that fasterize can't do, and actually a lot Rasters can do that gdal can't do either - like arbitrary functions (even functions like median that need to sort) and custom objects/number types. it would be good to cover some of those - at least things gdal can do that fasterize cant) |
FWIW: Ideally, it would be useful to compare the performance of the current GDAL algorithm and scan line algorithm at the C++ level, since what I presented #15 is quite limited. If the latter algorithm turned out to be significantly better, it would be worth implementing it directly in GDAL so that all packages could benefit from it. And as you mentioned, {fasterize} has some limitations, e.g. it only works with {raster} objects and polygons, and fewer options compared to GDAL. Here is related issue in the GDAL repository: OSGeo/gdal#7200 CC: @mdsumner because is also involved in this topic. |
I was working on fasterize today ... Btw, see this related effort here, and discussion in six hours from now: |
Well, Rasters wont benefit from faster gdal rasterize as we only use gdal for i/o and for gdalwarp. But it would be good to have detailed comparisons of these algorithms. I put a bunch of work into optimising the scanline in Rasters but there will be places it will be slower than fasterize - a few nice diagrams of the performance space would really help understand where things are at. |
I almost forgot about this, but for exact zonal statistics we have |
I imagined we can do that in the line burning phase and get coverage for each pixel the line touches instead of burning. But We also have subsampling coverage in Rasters |
I noticed there is mistake in the title. Instead of "Benchmark vector operations" it should be "Benchmark raster operations". Also here: https://github.com/rafaqz/Rasters.jl/blob/main/README.md#performance |
Yeah I think @asinghvi17 has reused the Makie.jl code from the vector benchmarks |
@asinghvi17 might also be nice to call it "Rasters.jl" in the label rather than "rasters_jl" ? People I have shown were confused by that |
We'll still keep pixi for dependency and environment management, but it was impossible to actually run a benchmark suite with it now, so I've made the bash script call into pixi instead.
Platform Info: |
Kinda weird that the other timings got worse, I'm not sure what changed if anything (nvdi especially, that's just a broadcast) And aggregate was far faster than resample but maybe we were benchmarking nearest rather than mean, so some work to do on mean |
This PR adds benchmarks for Rasters.jl, and adds a
pixi
environment file to manage the installations of R, Python, Julia, and packages.I added the pixi file because there's a version conflict if you try to install exactextractr and stars in the same environment, they require different versions of gdal. So now each package has a separate environment.
The way to run this is still that you just run
run-benchmarks
.