-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring LLama Attention and mlp layers #589
Conversation
Module for scope linearAllreduce this change allows better memory consumption and better optimizations in synapse Change-Id: I3a30a09d6d61aece7ce605bb672e1485d3fbe1cc
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left a last comment that will be addressed quickly.
Besides, do you have numbers to see the kind of memory that is saved doing this?
cmd line - pay attention i'm running already on 1.14 but i don't think the numbers changed much from 1.13 with change - reference |
Module for scope linearAllreduce
this change allows better memory consumption and better optimizations in synapse when running llama 70b on deepspeed