Skip to content

Conversation

@ilan-gold
Copy link
Collaborator

@ilan-gold ilan-gold commented Nov 3, 2025

Calling to_native_dtype + __str__ came up as one of the only python-CPU-bound things when doing some benchmarking. My use-case is quite contrived (generating thousands of WithSubset objects) but I think it's probably worth investigating getting rid of these calls. Some observations:

  1. I wonder if all getting the dtype and fill_val be wrapped up in just relying on https://docs.rs/zarrs/latest/zarrs/array/struct.Array.html#method.open and then using the values directly (there are probably other benefits of doing this) but I think this is a separate PR
  2. Regardless, most of this refactor is around removing Basic anyway so that chunk handling is independent of the ability. I noticed that ChunkRepresentation requires ownership over its arguments which means we copy per-chunk. Not sure what would go into making that a reference, but it's no worse than the previous situation where I think we were generating copies repeatedly, but from PyO3 calling python

The benefit wasn't crazy ~5% but I think going in this direction is good (see point 1)

TODO:

  • Understand our vlen test error messages / warnings re: what we support.

@ilan-gold ilan-gold marked this pull request as draft November 3, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants