[Model] Support deepseek with eagle #21086

xyang16 · 2025-07-17T01:57:00Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR is to support running eagle speculative decoding on deepseek model. Changed the following file:

deepseek_eagle.py: deepseek eagle model definition
registry.py: add the model to registry

Test Plan

Added test in test_spec_decode.py:

pytest -s -v tests/v1/e2e/test_spec_decode.py

vllm serve cmd:

export VLLM_USE_V1=1
export VLLM_MLA_DISABLE=1
vllm serve deepseek-ai/DeepSeek-R1 \
    --port 8080 \
    --tensor-parallel-size 8 \
    --max-model-len 8192 \
    --max-num-seqs 8 \
    --trust-remote-code \
    --speculative-config '{"method": "eagle", "model":"eagle618/eagle-deepseek-r1", "num_speculative_tokens": 3}'

Test Result

The following unit tests passed:

tests/v1/e2e/test_spec_decode.py::test_eagle_correctness[FLASH_ATTN_VLLM_V1-deepseek_eagle]
tests/v1/e2e/test_spec_decode.py::test_eagle_correctness[TREE_ATTN-deepseek_eagle]

serve deepseek-ai/DeepSeek-R1 and benchmarking with llmperf:

INFO 07-16 18:05:38 [metrics.py:87] SpecDecoding metrics: Draft acceptance rate: 58.4%, Mean acceptance length: 2.75, Accepted: 163 tokens, Drafted: 279 tokens, Per-position acceptance rate: 0.882, 0.559, 0.312
INFO 07-16 18:08:09 [metrics.py:87] SpecDecoding metrics: Draft acceptance rate: 63.3%, Mean acceptance length: 2.90, Accepted: 167 tokens, Drafted: 264 tokens, Per-position acceptance rate: 0.886, 0.670, 0.341

(Optional) Documentation Update

gemini-code-assist

Code Review

This pull request adds support for Eagle speculative decoding with Deepseek models. I've found a few critical issues in the implementation that will prevent it from working correctly. The model implementation in deepseek_eagle.py incorrectly handles hidden state dimensions and is missing the lm_head layer, which will cause runtime errors. Additionally, the model registry key in registry.py seems to be incorrect, which would prevent the model from being loaded.

gemini-code-assist · 2025-07-17T01:58:52Z

vllm/model_executor/models/deepseek_eagle.py

+        self.fc = nn.Linear(
+            self.config.model.hidden_size * 2,
+            self.config.model.hidden_size,
+            bias=False,
+        )
+
+        self.enorm = RMSNorm(self.config.hidden_size,
+                             eps=self.config.rms_norm_eps)
+        self.hnorm = RMSNorm(self.config.hidden_size,
+                             eps=self.config.rms_norm_eps)
+        self.norm = RMSNorm(self.config.hidden_size,
+                            eps=self.config.rms_norm_eps)


The implementation of DeepseekV2Model assumes that the draft model and the target model share the same hidden size. For instance, self.hnorm is initialized with the draft model's hidden size (self.config.hidden_size) but is applied to hidden_states from the target model.

This assumption is incorrect for the models used in testing (deepseek-r1 has a hidden size of 4096, while eagle-deepseek-r1 has 1024), and will lead to a runtime error due to shape mismatch.

To fix this, you should explicitly use the hidden sizes from both the draft and target model configurations. You can access the target model's configuration via vllm_config.model_config.

target_config = vllm_config.model_config.hf_config draft_hidden_size = self.config.hidden_size target_hidden_size = target_config.hidden_size self.fc = nn.Linear( draft_hidden_size + target_hidden_size, draft_hidden_size, bias=False, ) self.enorm = RMSNorm(draft_hidden_size, eps=self.config.rms_norm_eps) self.hnorm = RMSNorm(target_hidden_size, eps=target_config.rms_norm_eps) self.norm = RMSNorm(draft_hidden_size, eps=self.config.rms_norm_eps)

vllm/model_executor/models/deepseek_eagle.py

vllm/model_executor/models/registry.py

github-actions · 2025-07-17T02:00:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Ja1Zhou · 2025-07-17T05:03:55Z

Hi! I tried installing this pr from source. But got

OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.

Should the auto_map field of config.json be fixed?

xyang16 · 2025-07-17T05:23:56Z

Hi! I tried installing this pr from source. But got
OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.
Should the auto_map field of config.json be fixed?

Thanks for your comment, fixed now.

Ja1Zhou · 2025-07-17T16:58:41Z

Hi! I tried installing this pr from source. But got
OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.
Should the auto_map field of config.json be fixed?
Thanks for your comment, fixed now.

Amazing work!

I wonder if you could share how you got eagle618/eagle-deepseek-r1? As this pr could also improve DS V3 etc. Thank you!

aarnphm

Do you have a eagle checkpoint to test with this? If you have some numbers that would be great.

Llama 4 EAGLE was landed recently, so I do think we can probably do the same wrt to tests for this (if you need some references for test cases)

xyang16 · 2025-07-29T17:15:48Z

Do you have a eagle checkpoint to test with this? If you have some numbers that would be great.

Llama 4 EAGLE was landed recently, so I do think we can probably do the same wrt to tests for this (if you need some references for test cases)

Thanks for your review. I have tested with the checkpoint eagle618/eagle-deepseek-r1.

Also added unit test case.

benchislett · 2025-08-14T22:40:14Z

vllm/model_executor/models/deepseek_eagle.py

+        stacked_params_mapping = [
+            # (param_name, shard_name, shard_id)
+            ("gate_up_proj", "gate_proj", 0),
+            ("gate_up_proj", "up_proj", 1),


Does this need to be made compatible with the fused_qkv_a_proj optimization from #21116? I have observed multiple issues with weight loading in MTP not being consistent with the DeepSeek base model weight loading. Will similar issues apply here?

I have updated the stacked_params_mapping. Thanks!

benchislett · 2025-08-14T22:42:51Z

vllm/model_executor/models/deepseek_eagle.py

+        hidden_states = self.fc(inputs)
+
+        # masking inputs at position=0
+        hidden_states[positions == 0] = 0


There have been many discussions in the community about how to properly handle the rotated input slot, but this does not seem in line with the final state. If I recall correctly, there was concern that overwriting the hidden states to zero will give out-of-distribution results during attention. See the other EAGLE implementations in vLLM (such as llama_eagle.py) for reference.

Yes, there's discussion this will mess up the attention normalization. I have removed this. Please review. Thanks.

I see deepseek_mtp.py has also masked the hidden states to 0: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/deepseek_mtp.py#L74

I can remove the line in deepseek_mtp.py together in this PR. But I can leave it there too. Let me know how you think.

Perhaps best to leave it pending a complete study of impact on AL for MTP. If there isn't a github issue for this task, please create one

benchislett · 2025-08-14T22:45:52Z

vllm/model_executor/models/deepseek_eagle.py

+
+        inputs = torch.cat(
+            [self.enorm(input_embeds),
+             self.hnorm(hidden_states)], dim=-1)


This looks like too many norms being applied. In the Llama_Eagle reference code, the input layernorm to each layer is disabled, and IIRC there is no output layernorm. Here, there are two norms applied to the input (pre-concat and input-layernorm after concat) and two more norms applied after (post_attention_layernorm and self.norm). This does not seem correct.

I have taken a look at the deepseek_mtp.py at https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/deepseek_mtp.py#L64. The only difference is output self.norm. But in our benchmarking, we found that including the output norm will increase acceptance rate.

benchislett · 2025-08-14T22:47:20Z

Acceptance rate is around 60% for vanilla eagle head (and better acceptance rate for fine tuned eagle head).

What does this mean? What are the weights of a "vanilla eagle head" in this case?

xyang16 · 2025-08-14T23:54:36Z

Acceptance rate is around 60% for vanilla eagle head (and better acceptance rate for fine tuned eagle head).

What does this mean? What are the weights of a "vanilla eagle head" in this case?

@benchislett Thanks for your review! I mean people can further fine tune the weights and get better acceptance rate. The "vanilla eagle head" in this case is eagle618/eagle-deepseek-r1.

Signed-off-by: Xin Yang <[email protected]>

benchislett

The implementation now seems more in-line with the MTP implementation. There are still differences between how we handle EAGLE and MTP models (whether norms are applied to input_layernorm or not, normed before output, for example) and this PR blends the two by implementing an EAGLE class in a manner more consistent with MTP.

We should try to find some way to unify implementations and reconcile the differences, but this is probably not the PR to bear that burden. For now, this will suffice and can be extended if future eagle-style MTP modules are released with slight differences in implementation.

simon-mo

Stamping given Benjamin approved.

Signed-off-by: Xin Yang <[email protected]>

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

Signed-off-by: Xin Yang <[email protected]>

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: Xin Yang <[email protected]>

gyou2021 · 2025-09-01T14:37:57Z

Why export VLLM_MLA_DISABLE=1? Is it ok that DeepSeek-R1 inference with VLLM_MLA_DISABLE=0 and deepseek_eagle with VLLM_MLA_DISABLE= 0 ?

Signed-off-by: Xin Yang <[email protected]>

mergify bot added deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding labels Jul 17, 2025

gemini-code-assist bot reviewed Jul 17, 2025

View reviewed changes

xyang16 force-pushed the eagle branch from 21fd8a3 to 1bd04e4 Compare July 17, 2025 02:11

xyang16 changed the title ~~[v1] Support deepseek with eagle~~ [Model] Support deepseek with eagle Jul 17, 2025

xyang16 force-pushed the eagle branch 4 times, most recently from 6981998 to c4cda03 Compare July 18, 2025 02:05

aarnphm reviewed Jul 29, 2025

View reviewed changes

xyang16 force-pushed the eagle branch from c4cda03 to db0b59d Compare August 14, 2025 20:58

mergify bot added the v1 label Aug 14, 2025

benchislett reviewed Aug 14, 2025

View reviewed changes

xyang16 force-pushed the eagle branch 2 times, most recently from 582dc36 to e26554a Compare August 14, 2025 23:37

xyang16 force-pushed the eagle branch 2 times, most recently from 498955f to 7d3a40f Compare August 15, 2025 03:30

[Model] Support deepseek with eagle

e90d5ac

Signed-off-by: Xin Yang <[email protected]>

xyang16 force-pushed the eagle branch from 7d3a40f to b8c9490 Compare August 15, 2025 03:30

benchislett approved these changes Aug 18, 2025

View reviewed changes

xyang16 requested a review from aarnphm August 18, 2025 22:06

Merge branch 'main' into eagle

15d7309

simon-mo approved these changes Aug 19, 2025

View reviewed changes

simon-mo enabled auto-merge (squash) August 19, 2025 22:06

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 19, 2025

Fix ci

46757da

Signed-off-by: Xin Yang <[email protected]>

auto-merge was automatically disabled August 20, 2025 05:27
Head branch was pushed to by a user without write access

xyang16 requested review from DarkLight1337 and ywang96 as code owners August 20, 2025 05:27

DarkLight1337 merged commit 83e69a0 into vllm-project:main Aug 20, 2025
42 checks passed

xyang16 deleted the eagle branch August 20, 2025 11:04

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

f5d48f8

Signed-off-by: Xin Yang <[email protected]>

cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

14eaf03

Signed-off-by: Xin Yang <[email protected]>

MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Aug 20, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

02b939e

Signed-off-by: Xin Yang <[email protected]>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

3e32704

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 21, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

2cb52e0

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 22, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

f0242eb

Signed-off-by: Xin Yang <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

61250b6

Signed-off-by: Xin Yang <[email protected]>

juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 28, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

0fb9c1c

Signed-off-by: Xin Yang <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

6070a68

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

0866c0b

Signed-off-by: Xin Yang <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

0191a44

Signed-off-by: Xin Yang <[email protected]>

dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

013b6a9

Signed-off-by: Xin Yang <[email protected]>

2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

4fc40d1

Signed-off-by: Xin Yang <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

9e07423

Signed-off-by: Xin Yang <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

3dbe928

Signed-off-by: Xin Yang <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

41e5db5

Signed-off-by: Xin Yang <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

798bdd8

Signed-off-by: Xin Yang <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[Model] Support deepseek with eagle (vllm-project#21086)

c958af4

Signed-off-by: Xin Yang <[email protected]>

Uh oh!

[Model] Support deepseek with eagle #21086

[Model] Support deepseek with eagle #21086

Uh oh!

Conversation

xyang16 commented Jul 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

Ja1Zhou commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xyang16 commented Jul 17, 2025

Uh oh!

Ja1Zhou commented Jul 17, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

xyang16 commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benchislett Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

xyang16 Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

xyang16 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xyang16 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

xyang16 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benchislett commented Aug 14, 2025

Uh oh!

xyang16 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

simon-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gyou2021 commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xyang16 commented Jul 17, 2025 •

edited by github-actions bot

Loading

Ja1Zhou commented Jul 17, 2025 •

edited

Loading

xyang16 commented Jul 29, 2025 •

edited

Loading

xyang16 Aug 14, 2025 •

edited

Loading

xyang16 Aug 14, 2025 •

edited

Loading

xyang16 commented Aug 14, 2025 •

edited

Loading

gyou2021 commented Sep 1, 2025 •

edited

Loading