Expert Parallelism (EP) Support for DeepSeek V2 #12583

cakeng · 2025-01-30T18:24:13Z

The current vLLM execution only supports TP when running MoE models.

This PR adds support for Expert Parallelism (EP) for the FusedMoE Kernel and DeepSeek V2 model, which should be extendable to V3 and other MoE models as well.

Use –expert_parallel_size engine argument to specify the EP size the FuseMoE kernel uses.

Currently does not work with CUDA graph, but the problem with CUDA graph seems to be on the current attention layer (MLA?) implementation. I tried CUDA graph + EP on the previous attention layer implementation (commit 60284b5), and it works fine.

Doc

github-actions · 2025-01-30T18:24:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

Left some comment but overall LGTM

vllm/config.py

vllm/distributed/parallel_state.py

vllm/model_executor/layers/fused_moe/layer.py

vllm/model_executor/layers/fused_moe/fused_moe.py

… size.

youkaichao · 2025-02-04T09:23:54Z

need to check the user-interface, but I feel right now we can just reuse the tp size for ep?

in the future, when we have DP, EP size will automatically be DP x TP.

LucasWilkinson · 2025-02-04T18:42:27Z

but the problem with CUDA graph seems to be on the current attention layer (MLA?) implementation.

can you please elaborate on this, a bit? MLA + CUDA graphs + TP is working fine on main as far as I am aware

cakeng · 2025-02-04T23:25:05Z

@youkaichao The current design support TP within an EP, but we can easily change that to have EP only on MoE layers. I think we will need more discussion with others on that design decision, the current implementation of EP+TP is based on a discussion with @WoosukKwon and @simon-mo.

@LucasWilkinson I just merged the main branch and CUDA graph is now working with EP+TP.

cakeng and others added 11 commits January 26, 2025 22:33

MoE init

949c914

EP config integrations

ba0549e

Weight loading

7d7285d

Added EP info to moe_align_block_size

991d126

Bugs

df154b6

Working EP+TP Prototype

3d700c4

Removed debugging print statements

2d05161

Fixes

60284b5

Merge branch 'main' into moe

9cdb728

Fused MoE Kernel fixes, Errors on CudaGraph Capture

03b8afb

Merge branch 'vllm-project:main' into moe

cdb252d

cakeng requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, zhuohan123 and youkaichao as code owners January 30, 2025 18:24

comaniac reviewed Jan 30, 2025

View reviewed changes

Expert mapping fixes, num_experts does not need to be divisible by EP…

f7dcd7b

… size.

cakeng added 2 commits February 4, 2025 14:12

Merge branch 'main' into moe

b3e00f5

Merge branch 'main' into moe

d8cb2b3

mergify bot added the v1 label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expert Parallelism (EP) Support for DeepSeek V2 #12583

Expert Parallelism (EP) Support for DeepSeek V2 #12583

cakeng commented Jan 30, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 30, 2025

comaniac left a comment

youkaichao commented Feb 4, 2025

LucasWilkinson commented Feb 4, 2025

cakeng commented Feb 4, 2025

Expert Parallelism (EP) Support for DeepSeek V2 #12583

Are you sure you want to change the base?

Expert Parallelism (EP) Support for DeepSeek V2 #12583

Conversation

cakeng commented Jan 30, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 30, 2025

comaniac left a comment

Choose a reason for hiding this comment

youkaichao commented Feb 4, 2025

LucasWilkinson commented Feb 4, 2025

cakeng commented Feb 4, 2025

cakeng commented Jan 30, 2025 •

edited by github-actions bot

Loading