Skip to content

Reduce thundering herd of reserve fragment ID calls during parallelized compaction #6075

@hamersaw

Description

@hamersaw

Currently, when parallelizing compaction tasks (ex. over large datasets) each task individually calls reserve_fragment_ids. This operation requires a transactional manifest update. When many of these occur in parallel, one will succeed and the others will need to be retried; where each retry incurs some backoff delay before rereading each manifest (from the original) and reattempting to commit.

In my test, which ran parallelized compaction (24 threads) over a 100k fragment dataset (16 rows per fragment) while targeting 144 fragments for the resulting dataset required 400+ manifest write attempts (144 expected). This manifests as a 1h53m runtime, which is significantly slower than just rewriting the dataset with 144 fragments (23m55s).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions