-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix deepspeed job #37284
fix deepspeed job #37284
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
@@ -382,7 +382,7 @@ jobs: | |||
run: pip freeze | |||
|
|||
- name: Set `machine_type` for report and artifact names | |||
working-directory: /transformers | |||
working-directory: ${{ inputs.working-directory-prefix }}/transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only this is relevant
other changes to be reverted before merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am moving on here, and hope someone will take care of updating the deepspeed docker file
docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile
soon. But we can discuss offline.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
bc05d2f
to
2e2e66d
Compare
cc @S1ro1 (not sure if relevant however) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Left a question
run: python3 -m pip install numpy==1.24.3 numba==0.61.0 scipy==1.12.0 scikit-learn==1.6.1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to install it with specfic versions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got numpy 2 in the docker image build and the pytest is failing with Error from the beginning. Don't really know the reason, the docker file is somehow too old and need an update. It's currently even use
ARG PYTORCH='2.2.0'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the docker and run the tests
What does this PR do?
Wrong
working_directory
and gotAlso install some packages at specific versions.
The pytest command is running with these changes
https://github.com/huggingface/transformers/actions/runs/14269422275/job/39998982497
I will leave you people to update the deepspeed docker image however: it's quite old.