-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix deprecated torch.cuda.amp.GradScaler
FutureWarning for pytorch 2.4+
#3132
Conversation
torch.cuda.amp.GradScaler
FutureWarning for pytorch 2.4+
Thanks for working on this deprecation. Could you please run |
@BenjaminBossan I have finished ur requests, would u plz trigger the workflows 🤗 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
strange that, I run make quality
ruff check .
All checks passed!
ruff format --check .
150 files already formatted
doc-builder style src/accelerate docs/source --max_len 119 --check_only @BenjaminBossan I haved checked the e583d2e results, seems that this workflow has bugs to be fixed. It indicates all test passed, but failed at last, make quality
shell: /usr/bin/bash -e {0}
env:
pythonLocation: /opt/hostedtoolcache/Python/[3](https://github.com/huggingface/accelerate/actions/runs/11116652373/job/30907116341#step:5:3).8.18/x64
LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.18/x6[4](https://github.com/huggingface/accelerate/actions/runs/11116652373/job/30907116341#step:5:4)/lib
ruff check .
All checks passed!
ruff format --check .
1[5](https://github.com/huggingface/accelerate/actions/runs/11116652373/job/30907116341#step:5:5)0 files already formatted
doc-builder style src/accelerate docs/source --max_len 119 --check_only
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.8.18/x[6](https://github.com/huggingface/accelerate/actions/runs/11116652373/job/30907116341#step:5:6)4/bin/doc-builder", line 8, in <module>
sys.exit(main())
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/doc_builder/commands/doc_builder_cli.py", line 4[7](https://github.com/huggingface/accelerate/actions/runs/11116652373/job/30907116341#step:5:8), in main
args.func(args)
File "/opt/hostedtoolcache/Python/3.[8](https://github.com/huggingface/accelerate/actions/runs/11116652373/job/30907116341#step:5:9).18/x64/lib/python3.8/site-packages/doc_builder/commands/style.py", line 28, in style_command
raise ValueError(f"{len(changed)} files should be restyled!")
ValueError: 1 files should be restyled!
make: *** [Makefile:17: quality] Error 1
Error: Process completed with exit code 2. |
It's the doc-builder that is complaining. This is the diff I get from running @@ -127,10 +127,11 @@ def find_executable_batch_size(function: callable = None, starting_batch_size: i
>>> @find_executable_batch_size(starting_batch_size=128)
- ... def train(batch_size, model, optimizer): ...
+ ... def train(batch_size, model, optimizer):
+ ... ...
- ... train(model, optimizer)
+ >>> train(model, optimizer)
```
"""
if function is None |
Interesting, but I didn't touch the file under I just updated the code based on your message. I'm aware that the @BenjaminBossan, Could you please trigger the CI again? |
Thanks @Mon-ius. I think one of your previous commits made a change to |
@BenjaminBossan I just deleted the file and download the original one, seems just one blank line make the CI failed. Could you plz trigger it again? 🤗 |
Hi, @BenjaminBossan Seems all checks have been passed now, could u plz to merge it 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this change with older PyTorch versions and unfortunately, it is only supported starting from 2.3 on. Since we want to support older versions as well, we need to make a torch version check.
Probably a similar argument applies to the other grad checkers, like torch.npu.amp.GradScaler
, but I haven't tested.
@BenjaminBossan Thx for your review, I applied a simple pytorch version check for 2.3 on GPU device, could u plz to trigger the CI? |
For npu device, I currently dont hold one to test, but I also applied the strategy on last seperate commit 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the version fixes. I have a few smaller comments.
It would probably make sense to factor out the grad scaler instantiation to a utility function to avoid code duplication, but I'll let Zach be the judge of that once he's back in office. For now, it can stay as is.
src/accelerate/checkpointing.py
Outdated
scaler (`torch.cuda.amp.GradScaler`, *optional*): | ||
An optional gradient scaler instance to save | ||
scaler (`torch.amp.GradScaler`, *optional*) for pytorch>2.3: | ||
An optional gradient scaler instance to save; for lower version, check `torch.cuda.amp.GradScaler` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this extra explanation in the docstring, it should be quite clear as is what this refers to.
src/accelerate/utils/dataclasses.py
Outdated
@@ -209,8 +209,9 @@ def register_comm_hook(self, model): | |||
class GradScalerKwargs(KwargsHandler): | |||
""" | |||
Use this object in your [`Accelerator`] to customize the behavior of mixed precision, specifically how the | |||
`torch.cuda.amp.GradScaler` used is created. Please refer to the documentation of this | |||
[scaler](https://pytorch.org/docs/stable/amp.html?highlight=gradscaler) for more information on each argument. | |||
`torch.amp.GradScaler` used is created for pytoch>2.3 or `torch.cuda.amp.GradScaler` for lower version. Please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, let's not overexplain.
src/accelerate/accelerator.py
Outdated
@@ -494,11 +495,17 @@ def __init__( | |||
elif is_musa_available(): | |||
self.scalar = torch.musa.amp.GradScaler(**kwargs) | |||
elif is_npu_available(): | |||
self.scaler = torch.npu.amp.GradScaler(**kwargs) | |||
if version.parse(torch.__version__) > version.parse("2.3"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we both can't test this, let's not touch NPU for now.
@BenjaminBossan Thanks again for the comments, I have reverted the code related to the documentation and NPU. 🤗 |
seems there is an CI self issue, which lacks =========================== short test summary info ============================
FAILED tests/trainer/test_trainer.py::TrainerIntegrationTest::test_trainer_saves_image_processor - ImportError:
CLIPImageProcessor requires the PIL library but it was not found in your environment. You can install it with pip:
`pip install pillow`. Please note that you may need to restart your runtime after installation.
FAILED tests/trainer/test_trainer.py::TrainerIntegrationTest::test_trainer_saves_processor - ImportError:
CLIPImageProcessor requires the PIL library but it was not found in your environment. You can install it with pip:
`pip install pillow`. Please note that you may need to restart your runtime after installation.
=========== 2 failed, 172 passed, 81 skipped, 42 warnings in 28.79s ============ |
Yes, appears to be unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your continued work on this PR. I'll leave this open for Zach to give the final review.
@BenjaminBossan, how about Zach's review of this PR 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks okay to me. This hints that we should probably have a get_grad_scaler
util func in Accelerate which can handle the backends automatically as part of the ops util package, but for the scope of this PR this works great.
Thanks!
With the release of PyTorch 2.4, the
torch.cuda.amp.GradScaler
API has been officially marked as deprecated.This pull request (PR) addresses the following issue: