Use np.load(..., mmap_mode="r") for time_vector in `BaseRecording._extra_metadata_from_folder` by grahamfindlay · Pull Request #4608 · SpikeInterface/spikeinterface

grahamfindlay · 2026-06-08T00:57:44Z

BaseRecording._extra_metadata_from_folder was eagerly loading a BinaryFolderRecording's time vector, which can be tens of GB on long recordings.

I ran into this because run_sorter_by_property (I was sorting tetrodes) reconstructs the recording once per joblib worker, so every worker was loading the whole time vector, even when only a short frame slice was needed.

Another consequence of the eager loading was that si.load on a 48h recording with a time_vector was using 1.6GB of memory, even if the time vector was never touched.

I switched the load to mmap_mode="r", so memory use no longer scales with both recording duration and number of workers. It is bounded to what's actually touched, and the memmap can be shared by the joblib workers.

The cost is that this time vector is read-only, so in-place operations on it would raise an error.

Thankfully there was only 1: TimeSeries.shift_times! I made it shift in place only when the vector is writeable and fall back to an out-of-place op for the now read-only memmap. (For simplicity, we could scrap the conditional altogether and just keep the out-of-place path, since this is presumably a pretty rare op -- but I chose to keep the in-place, no-copy path for best performance).

…ta_from_folder` `BaseRecording._extra_metadata_from_folder` was eagerly loading a `BinaryFolderRecording`'s time vector, which can be tens of GB on long recordings. I ran into this because `run_sorter_by_property` (I was sorting tetrodes) reconstructs the recording _once per joblib worker_, so every worker was loading the whole time vector, even when only a short frame slice was needed. Another consequence of the eager loading was that `si.load` on a 48h recording with a time_vector was using 1.6GB of memory, even if the time vector was never touched. I switched the load to `mmap_mode="r"`, so memory use no longer scales with both recording duration and number of workers. It is bounded to what's actually touched, and the memmap can be shared by the joblib workers. The cost is that this time vector is read-only, so in-place operations on it would raise an error. Thankfully there was only 1: `TimeSeries.shift_times`! I made it shift in place only when the vector is writeable and fall back to an out-of-place op for the now read-only memmap.

for more information, see https://pre-commit.ci

alejoe91 · 2026-06-08T07:21:18Z

+                if rs.time_vector.flags.writeable:
+                    # If the time_vector is writeable, shift in-place to avoid a copy.
+                    rs.time_vector += shift
+                else:
+                    # If the time_vector is a memmap from `np.load(..., mmap_mode='r')`,
+                    # in-place modification would error, so we shift a writable copy.
+                    rs.time_vector = rs.time_vector + shift


Why not simply this?

Suggested change

if rs.time_vector.flags.writeable:

# If the time_vector is writeable, shift in-place to avoid a copy.

rs.time_vector += shift

else:

# If the time_vector is a memmap from `np.load(..., mmap_mode='r')`,

# in-place modification would error, so we shift a writable copy.

rs.time_vector = rs.time_vector + shift

rs.time_vector = rs.time_vector + shift

to avoid copy no ?

grahamfindlay and others added 2 commits June 7, 2026 19:41

[pre-commit.ci] auto fixes from pre-commit.com hooks

46a717b

for more information, see https://pre-commit.ci

alejoe91 reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use np.load(..., mmap_mode="r") for time_vector in `BaseRecording._extra_metadata_from_folder`#4608

Use np.load(..., mmap_mode="r") for time_vector in `BaseRecording._extra_metadata_from_folder`#4608
grahamfindlay wants to merge 2 commits into
SpikeInterface:mainfrom
grahamfindlay:bugfix/excessive_eager_time_vector

grahamfindlay commented Jun 8, 2026

Uh oh!

alejoe91 Jun 8, 2026 •

edited

Loading

Uh oh!

samuelgarcia Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grahamfindlay commented Jun 8, 2026

Uh oh!

alejoe91 Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samuelgarcia Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alejoe91 Jun 8, 2026 •

edited

Loading