Add Qwen Moe #2163

heyyanshuman · 2025-03-23T10:30:00Z

This PR adds Qwen Mixture-of-expert model to Keras Hub.

Huggingface Reference : link

divyashreepathihalli

just reviewed the MOE part of the code.

divyashreepathihalli · 2025-03-31T18:41:52Z

keras_hub/src/models/qwen/qwen_decoder.py

@@ -79,7 +79,7 @@ def build(self, decoder_sequence_shape):
        self.hidden_dim = decoder_sequence_shape[-1]

        # Self attention layer.
-        self._self_attention_layer = QwenAttention(
+        self._self_attention_layer = QwenMoeAttention(


I see you have forked this file in qwen_moe folder - why is this being edited?

looks like this slipped in in find & replace, fixed it.

divyashreepathihalli · 2025-03-31T19:06:25Z

keras_hub/src/models/qwen_moe/qwen_moe_decoder.py

+
+        for expert_idx in range(self.num_experts):
+            expert_layer = self.experts[expert_idx]
+            idx, top_x = ops.where(expert_mask[expert_idx])


will the ops.where return different output shape here on different forward passes? - if so that would not work with JAX XLA

I don't think that this should be the case. Why do you think so?

ops.where won't work on JAX at all, if you don't provide x and y.

ops.where calls jnp.where. Check the note here:

Because the size of the output of nonzero is data-dependent, the function is not compatible with JIT and other transformations. The JAX version adds the optional size argument which must be specified statically for jnp.nonzero to be used within JAX’s transformations.

You can, however, consider passing the size argument. That might make it work.

divyashreepathihalli · 2025-03-31T19:16:14Z

keras_hub/src/models/qwen_moe/qwen_moe_decoder.py

+        )
+        expert_mask = ops.transpose(expert_mask, axes=[2, 1, 0])
+
+        for expert_idx in range(self.num_experts):


The for loop here to go over the experts are inefficient for XLA compilation. This implementation would need to updated - I had tried out a dummy MOE implementation in JAX here - https://colab.sandbox.google.com/drive/1r0rscZK_2bNpDmFLC1POEEQoKcqWQYlQ
in order to bring this to KearsHub we are missing ragged_dot op.

I have prototyped the implementation - will add this op soon

When can we expect this to be available as a part of keras op?

heyyanshuman · 2025-04-02T11:10:26Z

@divyashreepathihalli How should we accomodate aux_loss for CausalLM task here model here?

We are specifying Sparse Categorical CrossEntropy Loss here:

keras-hub/keras_hub/src/models/causal_lm.py

Lines 109 to 119 in b997444

    
           if optimizer == "auto": 
        
               optimizer = keras.optimizers.Adam(2e-5) 
        
           if loss == "auto": 
        
               loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) 
        
           if weighted_metrics == "auto": 
        
               weighted_metrics = [keras.metrics.SparseCategoricalAccuracy()] 
        
           super().compile( 
        
               optimizer=optimizer, 
        
               loss=loss, 
        
               weighted_metrics=weighted_metrics, 
        
               **kwargs,

divyashreepathihalli

Thanks for the updates @heyyanshuman!
I left some comments on the PR regarding tf ops
please add tests for the layers, backbones and tasks
I am curious to know if model.fit works, do you have a demo colab for inference and FT? - looking for the aux loss implementation

divyashreepathihalli · 2025-04-10T01:07:18Z

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

+        )
+        self._query_dense.build(inputs_shape)
+
+        self._key_dense = keras.layers.EinsumDense(


you might want to rename this to to match other KH models here - value_dense and query_dense
this will allow enabling LoRA on this Model - https://docs.google.com/document/d/1BSjDMuSP9N0e2sw83E5ujSbiwiDpoKzMBLCRN9xdtLY/edit?resourcekey=0-OMMpYhmIDBMxQtVuV5saSg&tab=t.0#heading=h.fttfr1z4ln91

I don't have access to this document :(

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

keras_hub/src/models/qwen_moe/qwen_moe_causal_lm.py

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

divyashreepathihalli · 2025-04-10T01:21:44Z

keras_hub/src/models/qwen_moe/qwen_moe_causal_lm.py

+@keras_hub_export(
+    "keras_hub.models.QwenMoeCausalLM",
+)
+class QwenMoeCausalLM(CausalLM):


add docstring and example to show model.fit and generate

heyyanshuman and others added 3 commits March 23, 2025 15:59

qwen moe init commit

1256614

wip

4e1d714

Merge branch 'keras-team:master' into qwen-moe

df0c409

heyyanshuman self-assigned this Mar 29, 2025

heyyanshuman requested review from mattdangerw, abheesht17 and divyashreepathihalli March 29, 2025 05:06

heyyanshuman marked this pull request as ready for review March 29, 2025 05:06

heyyanshuman added 3 commits March 29, 2025 13:34

wip

20be536

weight conversion wip

6986253

weight matching complete

d391cd2

heyyanshuman force-pushed the qwen-moe branch from c76184e to d391cd2 Compare March 29, 2025 08:04

update the docstrings + configs

1d1f18d

mattdangerw removed the request for review from divyashreepathihalli March 31, 2025 16:41

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Mar 31, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 31, 2025

divyashreepathihalli reviewed Mar 31, 2025

View reviewed changes

remove incorrect import

a597b82

heyyanshuman and others added 6 commits April 8, 2025 17:16

wip

bd17346

updates

abc2ad2

updates

c423410

updates

8d3d89a

Merge branch 'keras-team:master' into qwen-moe

eea62f0

bug fix

e85c404

divyashreepathihalli reviewed Apr 10, 2025

View reviewed changes

heyyanshuman added 4 commits April 10, 2025 14:10

add aux loss

a3fc50d

address comments

5e175b2

causal lm test

d87601e

add tests

68396cf

heyyanshuman added 3 commits April 13, 2025 09:57

add docstrings

7da4e10

small bug fix

9424adc

bug fix in aux loss

68ebb71

heyyanshuman requested a review from divyashreepathihalli April 14, 2025 05:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen Moe #2163

Add Qwen Moe #2163

heyyanshuman commented Mar 23, 2025 •

edited

Loading

divyashreepathihalli left a comment

divyashreepathihalli Mar 31, 2025

heyyanshuman Apr 2, 2025

divyashreepathihalli Mar 31, 2025

heyyanshuman Apr 2, 2025

abheesht17 Apr 2, 2025 •

edited

Loading

divyashreepathihalli Mar 31, 2025 •

edited

Loading

divyashreepathihalli Mar 31, 2025

heyyanshuman Apr 2, 2025

heyyanshuman commented Apr 2, 2025

divyashreepathihalli left a comment

divyashreepathihalli Apr 10, 2025

heyyanshuman Apr 10, 2025

divyashreepathihalli Apr 10, 2025

Add Qwen Moe #2163

Are you sure you want to change the base?

Add Qwen Moe #2163

Conversation

heyyanshuman commented Mar 23, 2025 • edited Loading

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abheesht17 Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli Mar 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heyyanshuman commented Apr 2, 2025

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heyyanshuman commented Mar 23, 2025 •

edited

Loading

abheesht17 Apr 2, 2025 •

edited

Loading

divyashreepathihalli Mar 31, 2025 •

edited

Loading