feat: article exploring ways to compute MoE #3038

drbh · 2025-08-25T14:32:05Z

This article explores three different ways to compute MoE and focuses on the "how" of MoE

merveenoyan

TIL! 💗

merveenoyan · 2025-08-28T08:57:11Z

three-moes.md

+
+# Three MoEs
+
+Three Ways to Compute Mixture of Experts (MoE) in PyTorch


might be nice to put this on the title

merveenoyan · 2025-08-28T08:57:58Z

three-moes.md

+
+Three Ways to Compute Mixture of Experts (MoE) in PyTorch
+
+Mixture of Experts (MoE) looks complex, but under the hood it’s just:


perhaps link to this blog so people have more initial context https://huggingface.co/blog/moe

merveenoyan · 2025-08-28T08:58:22Z

three-moes.md

+
+## Step 1: Routing
+
+Every token chooses its top-k experts with softmaxed scores.


would be nice to give just a little bit more context

merveenoyan · 2025-08-28T09:00:02Z

three-moes.md

+2. Apply MLPs (one per expert).
+3. Recombine outputs with routing weights.
+
+Below are **three ways** to compute MoE in PyTorch — from simple to complex.


Suggested change

Below are **three ways** to compute MoE in PyTorch — from simple to complex.

Below are **three ways** to build MoEs in PyTorch — from simple to complex.

merveenoyan · 2025-08-28T09:01:05Z

three-moes.md

+Quick test:
+
+```python
+hs = torch.randn(B, S, H)


very nice to have simple reproducible snippets, would be nice to put these into a notebook and link at the end so people don't have to copy paste to try

ArthurZucker · 2025-08-29T15:35:28Z

three-moes.md

in general let's avoid single letter variable names and find meaningful ones!

feat: article exploring ways to compute MoE

78c56d8

merveenoyan reviewed Aug 28, 2025

View reviewed changes

ArthurZucker reviewed Aug 29, 2025

View reviewed changes

three-moes.md

Copy link

Contributor

ArthurZucker Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general let's avoid single letter variable names and find meaningful ones!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: article exploring ways to compute MoE #3038

feat: article exploring ways to compute MoE #3038

drbh commented Aug 25, 2025

Uh oh!

merveenoyan left a comment

Uh oh!

merveenoyan Aug 28, 2025

Uh oh!

merveenoyan Aug 28, 2025

Uh oh!

merveenoyan Aug 28, 2025

Uh oh!

merveenoyan Aug 28, 2025

Uh oh!

merveenoyan Aug 28, 2025

Uh oh!

ArthurZucker Aug 29, 2025

Uh oh!

Uh oh!


		# Three MoEs

		Three Ways to Compute Mixture of Experts (MoE) in PyTorch


		Three Ways to Compute Mixture of Experts (MoE) in PyTorch

		Mixture of Experts (MoE) looks complex, but under the hood it’s just:


		## Step 1: Routing

		Every token chooses its top-k experts with softmaxed scores.

	Below are three ways to compute MoE in PyTorch — from simple to complex.
	Below are three ways to build MoEs in PyTorch — from simple to complex.

feat: article exploring ways to compute MoE #3038

Are you sure you want to change the base?

feat: article exploring ways to compute MoE #3038

Conversation

drbh commented Aug 25, 2025

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

merveenoyan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

merveenoyan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

merveenoyan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

merveenoyan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

merveenoyan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!