-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add Standalone Mixture of Experts example #2221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Standalone Mixture of Experts example #2221
Conversation
Summary of ChangesHello @Abhinavexists, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive example for implementing and utilizing Mixture-of-Experts (MoE) layers within Keras. It provides custom Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an excellent example of Mixture-of-Experts (MoE) layers for Keras. The implementation of DenseMoE and Conv2DMoE is clear and the example is well-structured, demonstrating their usage and benefits effectively. My review focuses on a few key areas for improvement:
- A critical performance issue in
Conv2DMoEdue to a loop that can be vectorized. - A bug in the expert utilization analysis section that uses the wrong activation function.
- A suggestion to improve the robustness of the analysis code by naming layers instead of using brittle indexing.
Overall, this is a great contribution and with these changes, it will be even better.
Added conditional gating activation for gating weights.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a great example for Standalone Mixture-of-Experts layers in Keras. The implementation of DenseMoE and Conv2DMoE is clear and the example is well-structured. I've found a couple of critical issues in the Conv2DMoE implementation related to dynamic shapes that would prevent it from running correctly. I've also included some suggestions to make the expert utilization analysis code more robust and maintainable. Overall, this is a valuable addition once the issues are addressed.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an excellent example of standalone Mixture-of-Experts (MoE) layers for Keras. The implementation of DenseMoE and Conv2DMoE is clear, efficient, and well-documented. The accompanying CIFAR-10 example effectively demonstrates their usage and provides a valuable comparison against a baseline model. I have one suggestion to improve the maintainability of the expert utilization analysis code. Overall, this is a high-quality contribution.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a well-structured and comprehensive example for Standalone Mixture-of-Experts (MoE) layers in Keras, including DenseMoE and Conv2DMoE. The implementation is clean and the example effectively demonstrates their usage, comparison against a baseline, and analysis of expert utilization. My review focuses on improving code clarity and removing some redundancies for a more streamlined and educational example. I've suggested simplifying the activation function calls within the custom layers and refactoring the expert utilization analysis to be more direct and easier to follow.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Refactor gating weight computation to use learned parameters directly, improving efficiency.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an excellent and well-documented example of standalone Mixture-of-Experts (MoE) layers for Keras. The implementations of DenseMoE and Conv2DMoE are correct and demonstrate an efficient approach. My review includes a few suggestions to enhance the reusability and maintainability of these new layers by setting more sensible defaults and refactoring duplicated code.
Added load balancing loss to encourage uniform expert utilization in the DenseMoE and Conv2DMoE layers. Included visualization of training history and expert utilization analysis.
Add standalone Mixture-of-Experts (MoE) layers for Keras, usable as drop-in replacements for Dense and Conv2D. Includes full example on CIFAR-10 demonstrating: