-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory-safe alternative to fasterizeDT? #5
Comments
Hi @mikoontz-vp , I don't think I'll add a That said, you might want to test out this function that I put together a few years ago. At the time, it was about 6 times faster than It's been a while since I looked into relative speed of the R world's various rasterization functions, so if I were you I might also have a look at And if either of those work faster/better than the function at the gist I pointed you to, please do let me know what you've learned about them! |
Okay, fair enough! My hack (the only thing I've gotten to work so far, and what made me think that a rasterizeDT might be helpful!) is to use exactextractr::exact_extract() to get cell coverage while retaining the cell id (pretty fast), write a raster to disk the same size as the (very large) template with each cell value being the index of each cell, then using a data.table of the "covered" cell numbers resulting from the exact_extract() call as the I had already left issues for {stars}, {sf}, and {terra} authors about how they wouldn't work for very large rasters: I had forgotten about your gist! I'll try that next. Thanks so much! |
Wow, quite a saga! If you want to test Robert's fix without recompiling the whole package, you could use this: library(terra)
trace("rasterize", signature=c(x="SpatVector", y="SpatRaster"), edit=TRUE) Add Robert's one line fix at the appropriate place and then (if using RStudio) press the Save button to return to the command line. After doing that, when calling |
@JoshOBrien, I recently did benchmark of functions for rasterizing polygons and the fastest is @mikoontz-vp, maybe controlledburn from @mdsumner will be helpful in some way here (but I'm not sure)? And little offtopic. @mikoontz-vp, maybe you should divide this 70 GB raster into smaller blocks before and process in loop? There is a function |
All of this is super helpful; thank you both so much (love the edit from a "." to a "!", Josh, in describing the saga! 😭) I'm going to close this issue now, as the initial ask (a |
Thanks to both of you. Not unexpectedly, the function in the gist I linked to (with a bit of additional cleaning up) is comparable in speed to the stars, terra, and GDAL approaches in @kadyb 's benchmark. |
the crux is definitely memory management, you can use terra or stars now to get tiles but I prefer doing the parts of workflows as steps in the process, fwiw hypertidy/grout will give the tiling based on simple inputs, it's just important to find out what the tiling is if there is one, vs what you define and use fwiw controlledburn is a way to get the index of fasterize, you'd still need an efficient way to write out the pixel values in a per tile way (again both stars and terra afaik can instantiate a raster and write-update to it now). fasterize is insanely fast but it has no way to avoid that huge matrix being in memory, I plan to just put its algorithm into gdal rather than various R packages - but also there are other compelling aspects to this taking my attention 😆 apologies to go on but seemed worth contextualizing a bit 🙏 |
Hello!
I'm wondering how feasible it might be to implement rasterizeDT in addition to fasterizeDT? Both fasterize and fasterizeDT don't work on very large template rasters due to memory issues, and I wonder if this would be a good use case for something more like rasterize that can be memory safe, but with a data.table back end?
Thanks so much for your continued development on this package!
The text was updated successfully, but these errors were encountered: