Skip to content

Conversation

@kryesh
Copy link
Contributor

@kryesh kryesh commented Apr 18, 2025

From discord conversation:
https://discord.com/channels/908281611840282624/915785344396439552/1341705839668625468

This updated logic makes LogMergePolicy aim for a specific target number of documents, and opportunistically skip merge operations to reach that target document count.
Pros:

  • Reduced IO/CPU usage from skipping intermediate merge operations
  • No longer susceptible to creating huge merge operations by merging many large segments into a single segment many times larger than the target size (previously max_docs_before_merge)
    • The theoretical maximum size of a segment with this updated logic is (target_segment_size * 2) - 2

Cons:

  • If an index has a little over target_segment_size total docs then it may get merged to a single segment and thus not parallelize well when searching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant