Use np.load(..., mmap_mode="r") for time_vector in BaseRecording._extra_metadata_from_folder#4608
Open
grahamfindlay wants to merge 2 commits into
Open
Conversation
…ta_from_folder` `BaseRecording._extra_metadata_from_folder` was eagerly loading a `BinaryFolderRecording`'s time vector, which can be tens of GB on long recordings. I ran into this because `run_sorter_by_property` (I was sorting tetrodes) reconstructs the recording _once per joblib worker_, so every worker was loading the whole time vector, even when only a short frame slice was needed. Another consequence of the eager loading was that `si.load` on a 48h recording with a time_vector was using 1.6GB of memory, even if the time vector was never touched. I switched the load to `mmap_mode="r"`, so memory use no longer scales with both recording duration and number of workers. It is bounded to what's actually touched, and the memmap can be shared by the joblib workers. The cost is that this time vector is read-only, so in-place operations on it would raise an error. Thankfully there was only 1: `TimeSeries.shift_times`! I made it shift in place only when the vector is writeable and fall back to an out-of-place op for the now read-only memmap.
for more information, see https://pre-commit.ci
alejoe91
reviewed
Jun 8, 2026
Comment on lines
+276
to
+282
| if rs.time_vector.flags.writeable: | ||
| # If the time_vector is writeable, shift in-place to avoid a copy. | ||
| rs.time_vector += shift | ||
| else: | ||
| # If the time_vector is a memmap from `np.load(..., mmap_mode='r')`, | ||
| # in-place modification would error, so we shift a writable copy. | ||
| rs.time_vector = rs.time_vector + shift |
Member
There was a problem hiding this comment.
Why not simply this?
Suggested change
| if rs.time_vector.flags.writeable: | |
| # If the time_vector is writeable, shift in-place to avoid a copy. | |
| rs.time_vector += shift | |
| else: | |
| # If the time_vector is a memmap from `np.load(..., mmap_mode='r')`, | |
| # in-place modification would error, so we shift a writable copy. | |
| rs.time_vector = rs.time_vector + shift | |
| rs.time_vector = rs.time_vector + shift |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BaseRecording._extra_metadata_from_folderwas eagerly loading aBinaryFolderRecording's time vector, which can be tens of GB on long recordings.I ran into this because
run_sorter_by_property(I was sorting tetrodes) reconstructs the recording once per joblib worker, so every worker was loading the whole time vector, even when only a short frame slice was needed.Another consequence of the eager loading was that
si.loadon a 48h recording with a time_vector was using 1.6GB of memory, even if the time vector was never touched.I switched the load to
mmap_mode="r", so memory use no longer scales with both recording duration and number of workers. It is bounded to what's actually touched, and the memmap can be shared by the joblib workers.The cost is that this time vector is read-only, so in-place operations on it would raise an error.
Thankfully there was only 1:
TimeSeries.shift_times! I made it shift in place only when the vector is writeable and fall back to an out-of-place op for the now read-only memmap. (For simplicity, we could scrap the conditional altogether and just keep the out-of-place path, since this is presumably a pretty rare op -- but I chose to keep the in-place, no-copy path for best performance).