-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 #24392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an IndexError
that occurred with eagle3
speculative decoding models. The fix in vllm/model_executor/models/llama.py
properly adds a bounds check before accessing the layer_types
list, preventing a crash. The corresponding change in vllm/model_executor/models/registry.py
to re-enable the LlamaForCausalLMEagle3
model is a logical consequence of this bug fix. The changes are accurate and well-implemented, and I have no further recommendations.
Shall we re-enable everything that was disabled in #22611? |
Sure. Updated on the new commit. Local pytest has passed through |
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - left a nit
Signed-off-by: wwl2755 <[email protected]>
vllm/model_executor/models/llama.py
Outdated
sliding_window = None | ||
if layer_types := getattr(config, "layer_types", None): | ||
if (layer_types := getattr(config, "layer_types", | ||
None)) and layer_idx < len(layer_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we instead have an assert that layer_idx < len(layer_types)
? That would help catch bug which would get pass the current if check silently and might manifest in some quality drop if layer_idx is over layer_types due to misconfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, which model is using sliding window in the eagle head?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In AngelSlim/Qwen3-14B_eagle3, it has layer_types: ['full_attention']
and layer_idx: 40
. If we use assertion, it will be stopped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to assume that layer_idx
of 40 is layer idx corresponding to the eagle 3 draft head? What is the value of len(layer_types)
when layer_idx
is 40?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I printed the number right before the bug.
(EngineCore_0 pid=830911) layer_types: ['full_attention']
(EngineCore_0 pid=830911) layer_idx: 40
which model is using sliding window in the eagle head?
None in my understanding, it is only to avoid entering the codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current fix wont work for draft having SWA. The fix bypasses applying the SWA if draft had layer_types: ['sliding_attention']
. The foolproof fix would be to redefine layer_types
in draft as draft.layer_types
= target.layer_types
+ draft.layer_types
and this new value would be passed to llama.py from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But I'm not sure whether other eagle3 model holds this assumption. Let me have some tests.
# Eagle model name should follow naming convention of | ||
# LlamaForCausalLM -> EagleLlamaForCausalLM | ||
# LlamaForCausalLM -> Eagle3LlamaForCausalLM | ||
# LlamaForCausalLMEagle3 -> LlamaForCausalLMEagle3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt get this part as to why do we need both LlamaForCausalLMEagle3
and Eagle3LlamaForCausalLM
to represent the same thing, i.e., llama Eagle 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment means we are able to handle both cases. There are two different names because people who creates the eagle model defines it that way. We cannot control what they have in the HF repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I definitely agree we should unify it (maybe in the future PR). It took me some time to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Just to confirm,
LlamaForCausalLM
andLlamaForCausalLMEagle3
are model names defined in HF?- there is no
LlamaForCausalLMEagle
to define EAGLE 1 in HF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes. Both
yuhuili/EAGLE-LLaMA3.1-Instruct-8B
andyuhuili/EAGLE3-LLaMA3.1-Instruct-8B
usedLlamaForCausalLM
, even though they are eagle1 and eagle3. Newer models likeAngelSlim/Qwen3-8B_eagle3
usesLlamaForCausalLMEagle3
-
None, to my understanding. If there is a popular one, there should be some bug reported.
Signed-off-by: wwl2755 <[email protected]>
@ekagra-ranjan Updated according to your suggestion. PTAL. I changed a little bit by using Tested with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing SWA as well. LGTM
error: |
This PR is causing Basic Models Test to fail on main |
Hi @DarkLight1337 Do you know how to reproduce in the env as in the CI? Directly run I pulled the latest main and it works fine e2e. 😕
|
Maybe you can use the error log to help debug: https://buildkite.com/vllm/ci/builds/30179#01993427-9309-4bf2-a37a-e63f8b87dc4f |
Thanks! It could be related to how we use the layer_idx. I'm looking closer at it. |
Hi @ggg-s I replied to you at: #23464 (comment) |
My CI is failing on an PR (#24127) that doesn't touch code here https://buildkite.com/vllm/ci/builds/30242#01993572-2b87-4ab4-a155-8c7b1a4a6570 Can we disable the CI test again as it's blocking forward progress for other PRs? |
Just a sec, on the final testing on a PR to fix it. I saw your PR is only metric-related, so it should be okay to go if it is the only failure. It is known, you can ask in the slack. Sorry for the inconvience. |
Fixing PR in: #24613 |
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]> Signed-off-by: rogeryoungh <[email protected]>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]> Signed-off-by: bruceszchen <[email protected]>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]> Signed-off-by: bruceszchen <[email protected]>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]>
Purpose
qwen3
Models ineagle3
Speculative Decoding #23464cc: @lec77 @22quinn
Test Result
Before PR:
After PR:
Server successfully launched.
Command:
Unit tests:
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.