-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support TP + FSDPv2 / HSDP or just FSDPv2 / HSDP #3395
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
pulse |
Signed-off-by: Mehant Kammakomati <[email protected]>
cc @S1ro1 |
@kmehant let me know if this works for your PR as expected, but it shouldn't. model = ...
optimizer = ...(model.parameters(),...)
model, optimizer = accelerate.prepare(model, optimizer) This should result in a higher memory usage as the optimizer holds the original model parameters. |
What does this PR do?
prepare_nd_device_mesh
util function to extend creation of device meshes for any combination of parallelisms. Currently it supports any combination of TP and FSDP/HSDPBefore submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@SunMarc @muellerzr
@kwen2501 from PyTorch