Skip to content

Conversation

RezaYazdaniAminabadi
Copy link
Contributor

After refactoring the running scripts, we missed to pass the local_rank argument that the Transformer kernel requires to run on multiple GPUs. I add it to the transformer_kernel configuration. Also the torch.distributed needs to be initialized before the model is created in nvidia_run_squad_deepspeed.py, otherwise, it fails when running the baseline. The rest of the changes is due to the formatting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant