Skip to content

Conversation

drbh
Copy link
Contributor

@drbh drbh commented Aug 25, 2025

This article explores three different ways to compute MoE and focuses on the "how" of MoE

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! 💗


# Three MoEs

Three Ways to Compute Mixture of Experts (MoE) in PyTorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to put this on the title


Three Ways to Compute Mixture of Experts (MoE) in PyTorch

Mixture of Experts (MoE) looks complex, but under the hood it’s just:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps link to this blog so people have more initial context https://huggingface.co/blog/moe


## Step 1: Routing

Every token chooses its top-k experts with softmaxed scores.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to give just a little bit more context

2. Apply MLPs (one per expert).
3. Recombine outputs with routing weights.

Below are **three ways** to compute MoE in PyTorch — from simple to complex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Below are **three ways** to compute MoE in PyTorch — from simple to complex.
Below are **three ways** to build MoEs in PyTorch — from simple to complex.

Quick test:

```python
hs = torch.randn(B, S, H)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice to have simple reproducible snippets, would be nice to put these into a notebook and link at the end so people don't have to copy paste to try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general let's avoid single letter variable names and find meaningful ones!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants