-
Notifications
You must be signed in to change notification settings - Fork 415
[float8] add float8 rowwise MoE prototype #1245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1e55807
9325c13
75328ea
de7cf4e
712828f
5df35ce
b2c60fa
43afdef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,3 +69,4 @@ selective_ac_option = '2' # 'int' = ac every positive int layer or 'op', ac bas | |
enable_fsdp_float8_all_gather = false | ||
precompute_float8_dynamic_scale_for_fsdp = false | ||
filter_fqns = ["output", "router.gate"] | ||
moe_fqns = ["experts"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you want to capture the shared expert? If so may need to use "expert" instead of "experts" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is well-tested, let's put it into the other toml configs as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not yet, this is intentional - the routed experts work with FSDP and TP, but shared expert only works with FSDP right now. Still debugging an issue related to shared expert + TP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add "prototype" to config name?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuzo requested "prototype" be in the field name here. Unless I misunderstood the suggestion?
Alternatively we could omit "prototype" from the field name and just make sure the docstring/help text is very clear it is a prototype feature with limitations.
For context, I don't plan to land this until at least FSDP is supported (ideally TP as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK either way then. Also since this is an experiment folder, everything could be experimental.