-
Notifications
You must be signed in to change notification settings - Fork 15
Update PyTorch Llama3 70B recipe to calculate metrics from profile #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Very cool work, @bhavya01, do you have plan to add this new metric to Mixtral8_7B too? |
Yes, will do it for Mixtral too |
@@ -1,6 +1,5 @@ | |||
# Base package containing nightly PyTorch/XLA | |||
ARG BASE_IMAGE=us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm | |||
FROM ${BASE_IMAGE} | |||
FROM us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm_cxx11_20250211 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try run with the 20250211 base image with the full pod? Context: pytorch/xla#8683
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Let me try running on full pod as well.
(`us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-xla/llama3-70b:feb14build`). | ||
The docker image uses torch and torch_xla nightly build from 02/11/2024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not create a label for the currently used test, and then rotate that between different versions? This could avoid possible human error, and removes the requirement to change version.
I only updated the step time calculation in the branch
[flash_attention_minibatch_v6e](https://github.com/pytorch-tpu/transformers/compare/flash_attention_minibatch_v6e)
with the commit pytorch-tpu/transformers@b185651The output of the script looks as follows: