Does spike vector need a full lexsort ?

After trying to fasten the kilosort/phy reading and subselecting units with sevral way:
 * #4502
 * #4579 

It appears that the main slow down is to force the spike vector to be fully lexsorted:

`spikes = spikes[np.lexsort((spikes["unit_index"], spikes["sample_index"], spikes["segment_index"]))]`

It use to be partially lexsorted : 
`spikes = spikes[np.lexsort((spikes["sample_index"], spikes["segment_index"]))]`

This commit recent commit https://github.com/SpikeInterface/spikeinterface/commit/456265c9d48e2e749861660c4bf0e3760027f6a0 has introduced this "need of fully lexsorted".

**In short does 2 spikes with same sample_index need to be ordered by unit_index ?**

**Yes : we need full lexsort**

Pro : 
 * It ensure always an order of the spiek vector because spike amplitude, amplitude scaling, spike location is a vector related to this spike vector.

Cons :
 * it is a big slow down when selecting units 
 * constructing the spike vector (with many spikes) can be very slow


**No : we need partial lexsort**

Pro : 
 * We do not care 2 spikes of same sample_index to be ordered by units for any computation (CCG)
 * We only need to ensure that the spike vector is always constructed/read/cached in the same consistent order (for instance by reading the phy/kilosort spike_clusters.npy)

Cons :
 * some future part of code could rely on the fact this vector is fully lexsorted


My personnal taste is second case.

**The main problem** : the spikeinterface codebase use to be *No : we need partial lexsort* and in february 2025 we switched to *Yes : we need full lexsort* without too much backward compatibility in mind. Zarr is now opening the internal spike vector and force lexsort in the `__init__()` which is a big big slow down but also can break the order of spike amplitude, amplitide scaling, ... it this extension was computed before this change (v 0.104.0) and the analyzer was saved in the zarr format before this version.

I would strongly vote for removing the behavior (force full lexsort) and go back to previous situation. But we need to find a better way for backward compatibility to avoid the same mistake.

@grahamfindlay @alejoe91 @chrishalcrow @yger @tayheau
Any opinion on this ?


PS : the first reader able to find more than 10 typos in this post will have a glass of wine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does spike vector need a full lexsort ? #4606

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Does spike vector need a full lexsort ? #4606

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions