-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed (exitcode: -8) local_rank: 6 (pid: 58423) of binary: /opt/miniconda/bin/python When run GRPO #254
Comments
when i try to get more information of this exception, I add those code in grpo.py:
the log became:
|
This is probably due to cuda and torch environment issues, I tried to run it on cuda11.8 a100 and grpo worked fine when using zero2.yaml. But when running h20 on cuda12.2, the above problem occurs |
I solved this problem by upgrading GCC to 12 |
have the same problem cuda 12.4 , gcc version is (openr1) ubuntu@192-222-54-131: still the same issue ? (8xh100 ) |
I have the same problem cuda 12.4 any solutions? the code works if I only use 1 GPU though, when I try with more than 1 I get the error |
same !!!! |
The text was updated successfully, but these errors were encountered: