Skip to content

Support inserting based on batchsize into shuffler#1369

Open
ayushdg wants to merge 9 commits intoNVIDIA-NeMo:mainfrom
ayushdg:batched-exact-dedup
Open

Support inserting based on batchsize into shuffler#1369
ayushdg wants to merge 9 commits intoNVIDIA-NeMo:mainfrom
ayushdg:batched-exact-dedup

Conversation

@ayushdg
Copy link
Contributor

@ayushdg ayushdg commented Jan 13, 2026

Description

  • This pr adds support for inserting into a shuffler with a batched method if available and adds support in the ExactDuplicateIdentification stage.
  • Improves speed for exact dedup tests by ~25s on my machine (1:54s -> 1:29s)

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 13, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ayushdg
Copy link
Contributor Author

ayushdg commented Jan 14, 2026

/ok to test 41ba9c0

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant