Skip to content

Use np.load(..., mmap_mode="r") for time_vector in BaseRecording._extra_metadata_from_folder#4608

Open
grahamfindlay wants to merge 2 commits into
SpikeInterface:mainfrom
grahamfindlay:bugfix/excessive_eager_time_vector
Open

Use np.load(..., mmap_mode="r") for time_vector in BaseRecording._extra_metadata_from_folder#4608
grahamfindlay wants to merge 2 commits into
SpikeInterface:mainfrom
grahamfindlay:bugfix/excessive_eager_time_vector

Conversation

@grahamfindlay
Copy link
Copy Markdown
Contributor

BaseRecording._extra_metadata_from_folder was eagerly loading a BinaryFolderRecording's time vector, which can be tens of GB on long recordings.

I ran into this because run_sorter_by_property (I was sorting tetrodes) reconstructs the recording once per joblib worker, so every worker was loading the whole time vector, even when only a short frame slice was needed.

Another consequence of the eager loading was that si.load on a 48h recording with a time_vector was using 1.6GB of memory, even if the time vector was never touched.

I switched the load to mmap_mode="r", so memory use no longer scales with both recording duration and number of workers. It is bounded to what's actually touched, and the memmap can be shared by the joblib workers.

The cost is that this time vector is read-only, so in-place operations on it would raise an error.

Thankfully there was only 1: TimeSeries.shift_times! I made it shift in place only when the vector is writeable and fall back to an out-of-place op for the now read-only memmap. (For simplicity, we could scrap the conditional altogether and just keep the out-of-place path, since this is presumably a pretty rare op -- but I chose to keep the in-place, no-copy path for best performance).

grahamfindlay and others added 2 commits June 7, 2026 19:41
…ta_from_folder`

`BaseRecording._extra_metadata_from_folder` was eagerly loading a
`BinaryFolderRecording`'s time vector, which can be tens of GB on long
recordings.

I ran into this because `run_sorter_by_property` (I was sorting tetrodes)
reconstructs the recording _once per joblib worker_, so every worker was
loading the whole time vector, even when only a short frame slice was
needed.

Another consequence of the eager loading was that `si.load` on a 48h
recording with a time_vector was using 1.6GB of memory, even if the
time vector was never touched.

I switched the load to `mmap_mode="r"`, so memory use no longer scales
with both recording duration and number of workers. It is bounded to
what's actually touched, and the memmap can be shared by the joblib workers.

The cost is that this time vector is read-only, so in-place operations
on it would raise an error.

Thankfully there was only 1: `TimeSeries.shift_times`! I made it shift in
place only when the vector is writeable and fall back to an out-of-place
op for the now read-only memmap.
Comment on lines +276 to +282
if rs.time_vector.flags.writeable:
# If the time_vector is writeable, shift in-place to avoid a copy.
rs.time_vector += shift
else:
# If the time_vector is a memmap from `np.load(..., mmap_mode='r')`,
# in-place modification would error, so we shift a writable copy.
rs.time_vector = rs.time_vector + shift
Copy link
Copy Markdown
Member

@alejoe91 alejoe91 Jun 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply this?

Suggested change
if rs.time_vector.flags.writeable:
# If the time_vector is writeable, shift in-place to avoid a copy.
rs.time_vector += shift
else:
# If the time_vector is a memmap from `np.load(..., mmap_mode='r')`,
# in-place modification would error, so we shift a writable copy.
rs.time_vector = rs.time_vector + shift
rs.time_vector = rs.time_vector + shift

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid copy no ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants