Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Heat aware DistributedSampler for torch usage. #1807

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Berkant03
Copy link
Collaborator

@Berkant03 Berkant03 commented Feb 24, 2025

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • benchmarks: created for new functionality
    • benchmarks: performance improved or maintained
    • documentation updated where needed

Description

Issue/s resolved: #1789

Changes proposed:

Add a Heat aware DistributedSampler for usage for PyTorch use cases

Type of change

  • New feature (non-breaking change which adds functionality)

Does this change modify the behaviour of other functions? If so, which?

no

Copy link
Contributor

Thank you for the PR!

Copy link

codecov bot commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 18.51852% with 88 lines in your changes missing coverage. Please review.

Project coverage is 91.62%. Comparing base (23e373b) to head (875643d).
Report is 58 commits behind head on main.

Files with missing lines Patch % Lines
heat/utils/data/datatools.py 18.51% 88 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1807      +/-   ##
==========================================
- Coverage   92.26%   91.62%   -0.64%     
==========================================
  Files          84       84              
  Lines       12447    12554     +107     
==========================================
+ Hits        11484    11503      +19     
- Misses        963     1051      +88     
Flag Coverage Δ
unit 91.62% <18.51%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Berkant03 Berkant03 added enhancement New feature or request memory footprint and removed memory footprint labels Feb 24, 2025
@ClaudiaComito ClaudiaComito added this to the 1.6 milestone Feb 24, 2025
Copy link
Contributor

Thank you for the PR!

1 similar comment
Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

@Berkant03
Copy link
Collaborator Author

Berkant03 commented Mar 17, 2025

When using the normal comm.Bcast the Bcast only works the first time and the seconds time not anymore.

❯ mpirun -np 2 python test.py
0 tensor([2, 4, 3, 0, 1], dtype=torch.int32)
1 tensor([2, 4, 3, 0, 1], dtype=torch.int32)
...
1 tensor([455,   0,   0,   0,  32], dtype=torch.int32)
0 tensor([1, 4, 3, 2, 0], dtype=torch.int32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Implementation of a Heat Aware DistributedSampler for Interoperability with Pytorch Ecosystem
2 participants