Skip to content

Conversation

ArchishmanSengupta
Copy link

Summary

-> Replaced the O(m·n) sequential merge scan with a heap-driven algorithm that maintains candidate merges in a max-heap keyed by rank, updating only local neighbors on each merge.

-> This yields m·log n behavior where:
m: number of merges and
n: is the number of initial symbols

Key changes:

  1. BinaryHeap and candidate struct in _byte_pair_merge, maintaining a linked-list of live nodes and per-position versions to avoid stale heap entries.
  2. Computes local ranks via compute_rank_at and updates only affected neighbors after each merge.
  3. Added targeted unit tests for _byte_pair_merge boundaries.

Complexity:

Before: repeated linear scans → approximately O(m·n) in worst-case merges.
After: heap operations per merge → O(m·log n), with O(n) initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant