Skip to content

Conversation

@Abhinavexists
Copy link

Add standalone Mixture-of-Experts (MoE) layers for Keras, usable as drop-in replacements for Dense and Conv2D. Includes full example on CIFAR-10 demonstrating:

  • DenseMoE: soft-routed expert networks for fully connected layers
  • Conv2DMoE: convolutional expert networks with 1x1 gating
  • Expert specialization through softmax gating
  • Performance comparison against a standard baseline model
  • Expert utilization inspection via gating statistics

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Abhinavexists, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive example for implementing and utilizing Mixture-of-Experts (MoE) layers within Keras. It provides custom DenseMoE and Conv2DMoE layers that can serve as direct replacements for their standard counterparts, enabling the creation of models with increased capacity through specialized expert networks and learned routing mechanisms. The example demonstrates the practical application of these layers on the CIFAR-10 image classification task, including a comparison of performance against a traditional CNN baseline and an analysis of how experts are utilized.

Highlights

  • New Example Added: A new standalone example for Mixture-of-Experts (MoE) layers has been added to Keras, demonstrating how to implement and use these layers.
  • Custom MoE Layers: Introduces DenseMoE and Conv2DMoE layers, designed as drop-in replacements for standard Keras Dense and Conv2D layers, respectively.
  • CIFAR-10 Demonstration: The example includes a full demonstration on the CIFAR-10 dataset, showcasing the application of MoE layers in a convolutional neural network.
  • Performance Comparison and Analysis: The example provides a performance comparison between a MoE model and a standard baseline model, along with an inspection of expert utilization via gating statistics.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent example of Mixture-of-Experts (MoE) layers for Keras. The implementation of DenseMoE and Conv2DMoE is clear and the example is well-structured, demonstrating their usage and benefits effectively. My review focuses on a few key areas for improvement:

  • A critical performance issue in Conv2DMoE due to a loop that can be vectorized.
  • A bug in the expert utilization analysis section that uses the wrong activation function.
  • A suggestion to improve the robustness of the analysis code by naming layers instead of using brittle indexing.
    Overall, this is a great contribution and with these changes, it will be even better.

@Abhinavexists
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a great example for Standalone Mixture-of-Experts layers in Keras. The implementation of DenseMoE and Conv2DMoE is clear and the example is well-structured. I've found a couple of critical issues in the Conv2DMoE implementation related to dynamic shapes that would prevent it from running correctly. I've also included some suggestions to make the expert utilization analysis code more robust and maintainable. Overall, this is a valuable addition once the issues are addressed.

Abhinavexists and others added 3 commits December 10, 2025 16:14
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Abhinavexists
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent example of standalone Mixture-of-Experts (MoE) layers for Keras. The implementation of DenseMoE and Conv2DMoE is clear, efficient, and well-documented. The accompanying CIFAR-10 example effectively demonstrates their usage and provides a valuable comparison against a baseline model. I have one suggestion to improve the maintainability of the expert utilization analysis code. Overall, this is a high-quality contribution.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Abhinavexists
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-structured and comprehensive example for Standalone Mixture-of-Experts (MoE) layers in Keras, including DenseMoE and Conv2DMoE. The implementation is clean and the example effectively demonstrates their usage, comparison against a baseline, and analysis of expert utilization. My review focuses on improving code clarity and removing some redundancies for a more streamlined and educational example. I've suggested simplifying the activation function calls within the custom layers and refactoring the expert utilization analysis to be more direct and easier to follow.

Abhinavexists and others added 3 commits December 10, 2025 16:29
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Refactor gating weight computation to use learned parameters directly, improving efficiency.
@Abhinavexists
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent and well-documented example of standalone Mixture-of-Experts (MoE) layers for Keras. The implementations of DenseMoE and Conv2DMoE are correct and demonstrate an efficient approach. My review includes a few suggestions to enhance the reusability and maintainability of these new layers by setting more sensible defaults and refactoring duplicated code.

Added load balancing loss to encourage uniform expert utilization in the DenseMoE and Conv2DMoE layers. Included visualization of training history and expert utilization analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants