[WIP] add Rasters.jl benchmarks #18

asinghvi17 · 2024-09-23T20:50:02Z

This PR adds benchmarks for Rasters.jl, and adds a pixi environment file to manage the installations of R, Python, Julia, and packages.

I added the pixi file because there's a version conflict if you try to install exactextractr and stars in the same environment, they require different versions of gdal. So now each package has a separate environment.

The way to run this is still that you just run run-benchmarks.

avoid the `-` splitting

asinghvi17 · 2024-09-24T21:56:11Z

should I add #15 too while I'm at it @kadyb?

This allows exactextractr to be installed in a separate environment, so it doesn't provoke a version conflict with sf and stars via gdal. This also segments each language and creates dependency trees so life is a bit easier.

conforming with all the other benchmarks

asinghvi17 · 2024-09-24T22:17:03Z

Edited the first post now that the PR is functionally complete from the Rasters.jl end.

rasters_jl/downsample.jl

kadyb · 2024-09-25T10:13:36Z

should I add #15 too while I'm at it @kadyb?

@asinghvi17, let's wait with this, because it uses a different dataset. Separate issue is that depending on the dataset, {fasterize} can be significantly slower than GDAL.

I have another question -- In task "zonal" are exact zone statistics like in {exactextractr} used or approximate like in the other packages?

rafaqz · 2024-09-25T10:58:11Z

Its approximate like the other packages. Now I've seen exactextractr is exact I'll add that option in a few months!

But I think it will be even faster after that in most cases because it would be best with switching to an online stats approach (at least where that is possible for sum/prod/mean etc).

Currently zonal is Rasters.jl is just a nice but very basic shortcut for applying a function to the result of mask and crop over each geometry - its the rasterization machinery under mask that is fast.

So for #15 it would be really nice to have mask and rasterize benchmarks here too from my perspective!

(It would also be good for rasterize to have a range of datasets with different kinds of geometries with varying node densities and target raster resolutions to get a clear picture of the tradroffs. I also think in some cases fasterize will be close to Rasters.jl and others slower.

Another thing is there are things Rasters.jl and gdal can do that fasterize can't do, and actually a lot Rasters can do that gdal can't do either - like arbitrary functions (even functions like median that need to sort) and custom objects/number types. it would be good to cover some of those - at least things gdal can do that fasterize cant)

rasters_jl/write.jl

kadyb · 2024-09-25T12:43:55Z

FWIW: Ideally, it would be useful to compare the performance of the current GDAL algorithm and scan line algorithm at the C++ level, since what I presented #15 is quite limited. If the latter algorithm turned out to be significantly better, it would be worth implementing it directly in GDAL so that all packages could benefit from it. And as you mentioned, {fasterize} has some limitations, e.g. it only works with {raster} objects and polygons, and fewer options compared to GDAL. Here is related issue in the GDAL repository: OSGeo/gdal#7200

CC: @mdsumner because is also involved in this topic.

mdsumner · 2024-09-25T13:33:33Z

I was working on fasterize today ...

Btw, see this related effort here, and discussion in six hours from now:

https://github.com/developmentseed/warp-resample-profiling

https://discourse.pangeo.io/t/pangeo-showcase-geospatial-reprojection-in-python-2024-whats-available-and-whats-next/4531

rafaqz · 2024-09-25T13:51:24Z

Well, Rasters wont benefit from faster gdal rasterize as we only use gdal for i/o and for gdalwarp. But it would be good to have detailed comparisons of these algorithms. I put a bunch of work into optimising the scanline in Rasters but there will be places it will be slower than fasterize - a few nice diagrams of the performance space would really help understand where things are at.

asinghvi17 · 2024-09-26T16:04:31Z

I almost forgot about this, but for exact zonal statistics we have GO.coverage that is an efficient way to get area of a rectangle that a polygon covers. I think the rectangle there has to be axis aligned, though, which may present a problem for affine spaces or matrix lookups.

rafaqz · 2024-09-26T16:29:10Z

I imagined we can do that in the line burning phase and get coverage for each pixel the line touches instead of burning.

But zonal is doing a lot very fast, so it needs to be very much tuned to purpose to not end up with an order of magnitude or 2 slowdown.

We also have subsampling coverage in Rasters

kadyb · 2025-02-05T12:11:01Z

I noticed there is mistake in the title. Instead of "Benchmark vector operations" it should be "Benchmark raster operations".

Also here: https://github.com/rafaqz/Rasters.jl/blob/main/README.md#performance

rafaqz · 2025-02-05T14:33:17Z

Yeah I think @asinghvi17 has reused the Makie.jl code from the vector benchmarks

rafaqz · 2025-02-05T14:36:09Z

@asinghvi17 might also be nice to call it "Rasters.jl" in the label rather than "rasters_jl" ? People I have shown were confused by that

rasters_jl/plotting.jl

We'll still keep pixi for dependency and environment management, but it was impossible to actually run a benchmark suite with it now, so I've made the bash script call into pixi instead.

asinghvi17 · 2025-02-12T23:58:48Z

New benchmark image

asinghvi17 · 2025-02-13T00:05:48Z

Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 96 × Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
RAM: 16 GB

rafaqz · 2025-02-13T11:24:00Z

Kinda weird that the other timings got worse, I'm not sure what changed if anything (nvdi especially, that's just a broadcast)

And aggregate was far faster than resample but maybe we were benchmarking nearest rather than mean, so some work to do on mean

asinghvi17 and others added 11 commits April 12, 2024 14:39

Add initial files for Rasters.jl

9f29bac

Update gitignore

97c4a36

get Rasters.jl benchmarks to working order

d888a6a

get Rasters.jl kinda working

258b1c7

move from Rasters-jl to Rasters_jl

c00040b

avoid the `-` splitting

Use Rasters.extract instead of indexing

c990831

Add Pixi and Julia projects to lock versions + make this reproducible

963b54e

pixi still not working

e93077e

brief updates to files

9c922b0

add Julia local project.toml

bbb3437

tune bencmarks

cfde288

asinghvi17 added 5 commits September 24, 2024 15:07

Update pixi files to have separate envs per package

ea7c536

This allows exactextractr to be installed in a separate environment, so it doesn't provoke a version conflict with sf and stars via gdal. This also segments each language and creates dependency trees so life is a bit easier.

Make benchmarks work, add plotting

5f5a2bc

Plotting should never load unless not in benchmarking

152a328

Make load eager

a8abb00

conforming with all the other benchmarks

Set crop to the minimum value of the rest of the array

f320a93

asinghvi17 and others added 6 commits September 24, 2024 15:22

Clean up, write tif not nc

8b590bf

allow for potential rasterize test

0df68ad

add Rasters entry to readme

11b383f

clean up zonal

a4c5231

comments

5a95463

bugfix nvdi

00806cd

tiemvanderdeure reviewed Sep 25, 2024

View reviewed changes

rasters_jl/downsample.jl Outdated Show resolved Hide resolved

rafaqz reviewed Sep 25, 2024

View reviewed changes

rasters_jl/write.jl Outdated Show resolved Hide resolved

rafaqz mentioned this pull request Sep 27, 2024

Seems like a lot of duplicate effort tiemvanderdeure/SpeciesDistributionModels.jl#12

Open

asinghvi17 added 3 commits February 4, 2025 14:34

use aggregate (native Julia) instead of resample (GDAL)

4a30726

use LZW compression in write

26816b1

Switch from sum to mean in downsample example

a8bce32

asinghvi17 commented Feb 5, 2025

View reviewed changes

rasters_jl/plotting.jl Outdated Show resolved Hide resolved

asinghvi17 added 4 commits February 12, 2025 20:19

Go back to a bash script, remove pixi benchmarking

ce9de4c

We'll still keep pixi for dependency and environment management, but it was impossible to actually run a benchmark suite with it now, so I've made the bash script call into pixi instead.

Minor bugfixes

8649823

rename rasters_jl to rasters

2ae6cef

rename folder

0825ba2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] add Rasters.jl benchmarks #18

[WIP] add Rasters.jl benchmarks #18

asinghvi17 commented Sep 23, 2024 •

edited

Loading

asinghvi17 commented Sep 24, 2024

asinghvi17 commented Sep 24, 2024

kadyb commented Sep 25, 2024

rafaqz commented Sep 25, 2024 •

edited

Loading

kadyb commented Sep 25, 2024

mdsumner commented Sep 25, 2024

rafaqz commented Sep 25, 2024 •

edited

Loading

asinghvi17 commented Sep 26, 2024 •

edited

Loading

rafaqz commented Sep 26, 2024 •

edited

Loading

kadyb commented Feb 5, 2025

rafaqz commented Feb 5, 2025

rafaqz commented Feb 5, 2025

asinghvi17 commented Feb 12, 2025

asinghvi17 commented Feb 13, 2025

rafaqz commented Feb 13, 2025 •

edited

Loading

[WIP] add Rasters.jl benchmarks #18

Are you sure you want to change the base?

[WIP] add Rasters.jl benchmarks #18

Conversation

asinghvi17 commented Sep 23, 2024 • edited Loading

asinghvi17 commented Sep 24, 2024

asinghvi17 commented Sep 24, 2024

kadyb commented Sep 25, 2024

rafaqz commented Sep 25, 2024 • edited Loading

kadyb commented Sep 25, 2024

mdsumner commented Sep 25, 2024

rafaqz commented Sep 25, 2024 • edited Loading

asinghvi17 commented Sep 26, 2024 • edited Loading

rafaqz commented Sep 26, 2024 • edited Loading

kadyb commented Feb 5, 2025

rafaqz commented Feb 5, 2025

rafaqz commented Feb 5, 2025

asinghvi17 commented Feb 12, 2025

asinghvi17 commented Feb 13, 2025

rafaqz commented Feb 13, 2025 • edited Loading

asinghvi17 commented Sep 23, 2024 •

edited

Loading

rafaqz commented Sep 25, 2024 •

edited

Loading

rafaqz commented Sep 25, 2024 •

edited

Loading

asinghvi17 commented Sep 26, 2024 •

edited

Loading

rafaqz commented Sep 26, 2024 •

edited

Loading

rafaqz commented Feb 13, 2025 •

edited

Loading