Add Standalone Mixture of Experts example #2221

Abhinavexists · 2025-12-10T10:18:06Z

Add standalone Mixture-of-Experts (MoE) layers for Keras, usable as drop-in replacements for Dense and Conv2D. Includes full example on CIFAR-10 demonstrating:

DenseMoE: soft-routed expert networks for fully connected layers
Conv2DMoE: convolutional expert networks with 1x1 gating
Expert specialization through softmax gating
Performance comparison against a standard baseline model
Expert utilization inspection via gating statistics

gemini-code-assist · 2025-12-10T10:18:21Z

Summary of Changes

Hello @Abhinavexists, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive example for implementing and utilizing Mixture-of-Experts (MoE) layers within Keras. It provides custom DenseMoE and Conv2DMoE layers that can serve as direct replacements for their standard counterparts, enabling the creation of models with increased capacity through specialized expert networks and learned routing mechanisms. The example demonstrates the practical application of these layers on the CIFAR-10 image classification task, including a comparison of performance against a traditional CNN baseline and an analysis of how experts are utilized.

Highlights

New Example Added: A new standalone example for Mixture-of-Experts (MoE) layers has been added to Keras, demonstrating how to implement and use these layers.
Custom MoE Layers: Introduces DenseMoE and Conv2DMoE layers, designed as drop-in replacements for standard Keras Dense and Conv2D layers, respectively.
CIFAR-10 Demonstration: The example includes a full demonstration on the CIFAR-10 dataset, showcasing the application of MoE layers in a convolutional neural network.
Performance Comparison and Analysis: The example provides a performance comparison between a MoE model and a standard baseline model, along with an inspection of expert utilization via gating statistics.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an excellent example of Mixture-of-Experts (MoE) layers for Keras. The implementation of DenseMoE and Conv2DMoE is clear and the example is well-structured, demonstrating their usage and benefits effectively. My review focuses on a few key areas for improvement:

A critical performance issue in Conv2DMoE due to a loop that can be vectorized.
A bug in the expert utilization analysis section that uses the wrong activation function.
A suggestion to improve the robustness of the analysis code by naming layers instead of using brittle indexing.
Overall, this is a great contribution and with these changes, it will be even better.

examples/keras_recipes/mixture_of_experts.py

Added conditional gating activation for gating weights.

Abhinavexists · 2025-12-10T10:41:22Z

/gemini review

gemini-code-assist

Code Review

This pull request adds a great example for Standalone Mixture-of-Experts layers in Keras. The implementation of DenseMoE and Conv2DMoE is clear and the example is well-structured. I've found a couple of critical issues in the Conv2DMoE implementation related to dynamic shapes that would prevent it from running correctly. I've also included some suggestions to make the expert utilization analysis code more robust and maintainable. Overall, this is a valuable addition once the issues are addressed.

examples/keras_recipes/mixture_of_experts.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Abhinavexists · 2025-12-10T10:49:41Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an excellent example of standalone Mixture-of-Experts (MoE) layers for Keras. The implementation of DenseMoE and Conv2DMoE is clear, efficient, and well-documented. The accompanying CIFAR-10 example effectively demonstrates their usage and provides a valuable comparison against a baseline model. I have one suggestion to improve the maintainability of the expert utilization analysis code. Overall, this is a high-quality contribution.

examples/keras_recipes/mixture_of_experts.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Abhinavexists · 2025-12-10T10:53:54Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a well-structured and comprehensive example for Standalone Mixture-of-Experts (MoE) layers in Keras, including DenseMoE and Conv2DMoE. The implementation is clean and the example effectively demonstrates their usage, comparison against a baseline, and analysis of expert utilization. My review focuses on improving code clarity and removing some redundancies for a more streamlined and educational example. I've suggested simplifying the activation function calls within the custom layers and refactoring the expert utilization analysis to be more direct and easier to follow.

examples/keras_recipes/mixture_of_experts.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Refactor gating weight computation to use learned parameters directly, improving efficiency.

Abhinavexists · 2025-12-10T11:05:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an excellent and well-documented example of standalone Mixture-of-Experts (MoE) layers for Keras. The implementations of DenseMoE and Conv2DMoE are correct and demonstrate an efficient approach. My review includes a few suggestions to enhance the reusability and maintainability of these new layers by setting more sensible defaults and refactoring duplicated code.

examples/keras_recipes/mixture_of_experts.py

Added load balancing loss to encourage uniform expert utilization in the DenseMoE and Conv2DMoE layers. Included visualization of training history and expert utilization analysis.

Add Standalone Mixture of Experts example

785d82b

github-actions bot assigned sachinprasadhs Dec 10, 2025

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

examples/keras_recipes/mixture_of_experts.py Show resolved Hide resolved

examples/keras_recipes/mixture_of_experts.py Outdated Show resolved Hide resolved

examples/keras_recipes/mixture_of_experts.py Show resolved Hide resolved

Abhinavexists added 2 commits December 10, 2025 15:59

Implement conditional gating activation in MoE

f3764d5

Added conditional gating activation for gating weights.

Refactor expert output computation in call method

2b947be

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

Abhinavexists and others added 3 commits December 10, 2025 16:14

Update examples/keras_recipes/mixture_of_experts.py

853cf48

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/keras_recipes/mixture_of_experts.py

6d0550d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/keras_recipes/mixture_of_experts.py

5e499b6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

examples/keras_recipes/mixture_of_experts.py Outdated Show resolved Hide resolved

Update examples/keras_recipes/mixture_of_experts.py

707cd3b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

examples/keras_recipes/mixture_of_experts.py Outdated Show resolved Hide resolved

examples/keras_recipes/mixture_of_experts.py Outdated Show resolved Hide resolved

examples/keras_recipes/mixture_of_experts.py Outdated Show resolved Hide resolved

Abhinavexists and others added 3 commits December 10, 2025 16:29

Update examples/keras_recipes/mixture_of_experts.py

6f4de1d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/keras_recipes/mixture_of_experts.py

047c7cb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Optimize gating weight calculation in MoE

5624fda

Refactor gating weight computation to use learned parameters directly, improving efficiency.

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

Abhinavexists added 2 commits December 10, 2025 21:39

Enhance Mixture-of-Experts with load balancing loss

b311c1b

Added load balancing loss to encourage uniform expert utilization in the DenseMoE and Conv2DMoE layers. Included visualization of training history and expert utilization analysis.

Add files via upload

070f280

Add Standalone Mixture of Experts example #2221

Are you sure you want to change the base?

Add Standalone Mixture of Experts example #2221

Uh oh!

Conversation

Abhinavexists commented Dec 10, 2025

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Abhinavexists commented Dec 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Abhinavexists commented Dec 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Abhinavexists commented Dec 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Abhinavexists commented Dec 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants