-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codec pipeline memory usage #2904
Comments
https://github.com/TomAugspurger/zarr-python-memory-benchmark/blob/4039ba687452d65eef081bce1d4714165546422a/sol.py#L41 has a POC for using https://github.com/TomAugspurger/zarr-python-memory-benchmark/blob/4039ba687452d65eef081bce1d4714165546422a/sol.py#L63 shows an example reading a Zstd compressed dataset. https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/3567246b852d7adacbc10f32a58b0b3f6ac3d50b/reports/memray-flamegraph-sol-read-compressed.html shows that the peak memory usage is ~ the size of the compressed dataset + the output ndarray (this does all the decompression first; we could do those sequentially to lower the peak memory usage). There are some complications around slices that don't align with zarr chunk boundaries that this ignores, but is maybe enough to prove that we could do better. |
Thanks for doing this work @TomAugspurger! Coincidentally, I've been looking at memory overheads for Zarr storage operations across different filesystems (local/cloud), compression settings, and Zarr versions: https://github.com/tomwhite/memray-array
Just reducing the number of buffer copies for aligned slices would be a big win for everyone who uses Zarr, since it would improve performance and reduce memory pressure. Hopefully similar techniques could be used for cloud storage too. |
Very cool!
I was wondering about this while looking into the performance of obstore and KvikIO. KvikIO lets the caller provide the |
I wonder if any of the memory management machinery that has been developed for Apache Arrow would be of use here? |
I looked into implementing this today and it'll be a decent amount of effort. There are some issues in the interface provided by the codec pipeline ABC ( Beyond the codec pipeline, I think we'll also need to update the |
Not the first person! I did made it out alive, but only barely. |
We discussed memory usage on Friday's community call. https://github.com/TomAugspurger/zarr-python-memory-benchmark started to look at some stuff.
https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-uncompressed.html has the memray flamegraph for reading an uncompressed array (400 MB total, split into 10 chunks of 40 MB each). I think the optimal memory usage here is about 400 MB. Our peak memory is about 2x that.
https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-compressed.html has the zstd compressed version. Peak memory is about 1.1 GiB.
I haven't looked too closely at the code, but I wonder if we could be smarter about a few things in certain cases:
readinto
directly into (an appropriate slice of)theout
array. We might need to expand the Store API to add some kind ofreadinto
, where the user provides the buffer to read into rather than the store allocating new memory.zstd.decode
takes an output buffer here that we could maybe use. And past that point, maybe all the codecs could reuse one or two buffers, rather than allocating a new buffer for each stage of the codec (one buffer if doing stuff inplace, two buffers if something can't be done inplace)?I'm not too familiar with the codec pipeline stuff, but will look into this as I have time. Others should feel free to take this if someone gets an itch though. There's some work to be done :)
The text was updated successfully, but these errors were encountered: