Enable gptoss eagle3 support#24325 #24410

baonudesifeizhai · 2025-09-08T04:16:42Z

Purpose

This PR adds EAGLE3 speculative decoding support for GPT-OSS models, enabling significant inference speedup through parallel token generation.

Model Architecture Support
GPT-OSS Model Integration
Configuration Fixes
EAGLE3 Implementation

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

- Add LlamaForCausalLMEagle3 class in llama_eagle3.py - Update registry to support LlamaForCausalLMEagle3 architecture - Re-enable LlamaForCausalLMEagle3 in test configurations - Add qwen3_eagle3 test case back to spec_decode tests This enables support for GPT-OSS-120B Eagle3 speculative decoding models.

- Handle empty quantization method string in SpeculativeConfig - Fix code formatting in LlamaForCausalLMEagle3 class This resolves the 'Unknown quantization method: .' error when using GPT-OSS-120B Eagle3 speculative decoding models.

- Add 'gpt_oss' to eagle3_target_supported list - Add field_validator for SpeculativeConfig quantization field - Handle empty quantization method strings properly This should resolve the remaining quantization validation errors when using GPT-OSS-120B Eagle3 models.

- Move field_validator to correct position after field definitions - This should properly handle empty quantization method strings - Resolves the remaining 'Unknown quantization method: .' error

- Add None to QuantizationMethods Literal type - Handle '.' quantization method string in field_validator - This should resolve the 'Unknown quantization method: .' error The issue was that QuantizationMethods Literal type didn't include None, and the Eagle3 model config contains a '.' quantization method value.

- Filter out None values from QUANTIZATION_METHODS list - Fix line length formatting issue - Resolve KeyError when quantization is None

- Fix spacing and line formatting issues - Ensure consistent code style

- Handle empty string and dot (.) quantization values - Convert them to None to prevent validation errors - Apply same logic as SpeculativeConfig validator

- Add empty string ('') and dot ('.') as valid quantization methods - This allows Pydantic validation to pass before field_validator processes them - Fixes the 'Unknown quantization method' validation error

- Only set quantization if quant_method is not empty string - Prevents empty string from being assigned to self.quantization - Fixes the 'Unknown quantization method' validation error - Addresses the root cause of the quantization validation issue

- Add _ensure_draft_vocab_size method to set draft_vocab_size from target model vocab_size - Prevents TypeError when draft_vocab_size is None in ParallelLMHead initialization - Ensures Eagle3 models have proper vocabulary size configuration

- Break long lines to comply with 80 character limit - Improve code readability and maintain style consistency

- Use yapf to automatically format code according to project standards - Improve code consistency and readability

…patibility - Add SupportsEagle3 interface to GptOssForCausalLM class - Implement set_aux_hidden_state_layers and get_eagle3_aux_hidden_state_layers methods - Fix embedding attribute compatibility in Eagle3 loader to handle both embed_tokens and embedding - Support dynamic detection of embedding layer names across different model architectures - Remove unused variable to fix linting errors

- Implement proper layer selection for auxiliary hidden state extraction - Fix forward method to return auxiliary states when EAGLE3 is enabled - Use middle layers for auxiliary state extraction (common EAGLE3 pattern) - Apply yapf formatting to fix line length issues - Ensure compatibility with EAGLE3's expectation of tuple return values

houseroad

Can we add some perf numbers to the PR description?

baonudesifeizhai · 2025-09-09T20:15:20Z

Can we add some perf numbers to the PR description?

sure currently try to test delete the def _validate_quantization_method(value: Any) -> Any: work or not

frank-wei · 2025-09-09T21:23:50Z

Is the draft model trained in house? Or used any OSS one?

zixi-qi · 2025-09-09T21:27:46Z

_validate_quantization_method

I ran into same issue with the EAGLE3 model from NVIDIA and fixed it with this change

diff --git a/vllm/config/__init__.py b/vllm/config/__init__.py

def iter_architecture_defaults():
     yield from _SUFFIX_TO_DEFAULTS
@@ -1120,6 +1139,8 @@ class ModelConfig:
                 elif quant_algo is not None:
                     raise ValueError(
                         f"Unknown ModelOpt quant algo: {quant_algo}")
+                else:
+                    quant_cfg = None
 
         return quant_cfg

baonudesifeizhai · 2025-09-10T03:27:35Z

_validate_quantization_method

I ran into same issue with the EAGLE3 model from NVIDIA and fixed it with this change

diff --git a/vllm/config/__init__.py b/vllm/config/__init__.py

def iter_architecture_defaults():
     yield from _SUFFIX_TO_DEFAULTS
@@ -1120,6 +1139,8 @@ class ModelConfig:
                 elif quant_algo is not None:
                     raise ValueError(
                         f"Unknown ModelOpt quant algo: {quant_algo}")
+                else:
+                    quant_cfg = None
 
         return quant_cfg

still facing missng TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'。https://pastebin.ubuntu.com/p/q2Ft53cKY9/

- Add GPT-OSS specific configuration handling in _setup_model_specific_config - Ensure fc layer input dimensions match target model's hidden size - Fix RuntimeError: mat1 and mat2 shapes cannot be multiplied This resolves the dimension mismatch between GPT-OSS-120B (16128) and EAGLE3 model expectations (8640x2880).

- Add debug prints to understand fc layer dimensions - Help diagnose matrix dimension mismatch issue

- Remove unused _setup_model_specific_config method - Remove debug print statements - Keep only essential combine_hidden_states override for dimension mismatch handling - Simplify the class to focus on core functionality

- Ensure dynamically created fc layer is on the same device as hidden_states - Add .to(device) to move the new Linear layer to GPU - Resolves RuntimeError: Expected all tensors to be on the same device

- Add target_hidden_size configuration in _ensure_draft_vocab_size method - Ensure EAGLE3 model uses correct input dimensions for fc layer - Remove runtime dimension mismatch fixes (no longer needed) - Fix code formatting issues This should resolve the matrix dimension mismatch errors for GPT-OSS-120B with EAGLE3

mergify · 2025-09-10T04:29:58Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @baonudesifeizhai.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

- Add debug prints in Eagle3LlamaForCausalLM and LlamaForCausalLMEagle3 constructors - This will help identify if our custom class is being instantiated - Needed to debug why matrix dimension mismatch still occurs

- Adopt main branch's solution for LlamaForCausalLMEagle3 mapping - LlamaForCausalLMEagle3 now maps to Eagle3LlamaForCausalLM class - Re-enable EAGLE3 tests as fixed in main branch - Remove TODO comments that are no longer needed

- Remove debug prints from Eagle3LlamaForCausalLM and LlamaForCausalLMEagle3 - Main branch has already fixed the EAGLE3 issues - Clean up code for production use

- Add _ensure_draft_vocab_size method to Eagle3LlamaForCausalLM class - This ensures draft_vocab_size and target_hidden_size are properly set - Fixes TypeError when draft_vocab_size is None in ParallelLMHead initialization - Now LlamaForCausalLMEagle3 models can work with Eagle3LlamaForCausalLM class

- Remove empty string checks in SpeculativeConfig.__post_init__ - Remove empty string checks in _verify_quantization method - Empty strings should be handled at CLI level via optional_type function - Python API should not pass empty strings for quantization methods

- Add else branch to set quant_cfg = None when quant_algo is None - This prevents quant_cfg from being undefined in ModelOpt quantization handling - Fixes issue with EAGLE3 models from NVIDIA that don't specify quant_algo

- Add debug prints in combine_hidden_states to show input/output shapes - This will help identify why fc layer dimensions don't match - Need to understand target_hidden_size vs actual input dimensions

- Add logic to resize fc layer based on actual weight dimensions - When target_hidden_size is None, infer correct dimensions from loaded weights - This fixes the matrix dimension mismatch (16128 vs 8640) issue - Update target_hidden_size in config after resizing for consistency

- Move dynamic fc layer resizing from load_weights to combine_hidden_states - This ensures the layer is resized at runtime when dimensions don't match - Add more debug output to track the resizing process - This should fix the matrix dimension mismatch issue

baonudesifeizhai · 2025-09-10T15:28:06Z

still not fix it , have some problem with chat

baonudesifeizhai and others added 18 commits September 7, 2025 19:53

Fix quantization validation for Eagle3 models

9fc5df7

- Handle empty quantization method string in SpeculativeConfig - Fix code formatting in LlamaForCausalLMEagle3 class This resolves the 'Unknown quantization method: .' error when using GPT-OSS-120B Eagle3 speculative decoding models.

Fix field_validator position for SpeculativeConfig quantization

61a1099

- Move field_validator to correct position after field definitions - This should properly handle empty quantization method strings - Resolves the remaining 'Unknown quantization method: .' error

Fix quantization None issue in QUANTIZATION_METHODS

0eee074

- Filter out None values from QUANTIZATION_METHODS list - Fix line length formatting issue - Resolve KeyError when quantization is None

Apply yapf formatting to llama_eagle3.py

f976588

- Fix spacing and line formatting issues - Ensure consistent code style

Fix ModelConfig quantization field_validator

c1c6d74

- Handle empty string and dot (.) quantization values - Convert them to None to prevent validation errors - Apply same logic as SpeculativeConfig validator

Add empty string and dot to QuantizationMethods

70c1984

- Add empty string ('') and dot ('.') as valid quantization methods - This allows Pydantic validation to pass before field_validator processes them - Fixes the 'Unknown quantization method' validation error

Fix

e37ff11

Fix line length formatting issues in LlamaForCausalLMEagle3

d535539

- Break long lines to comply with 80 character limit - Improve code readability and maintain style consistency

Apply yapf formatting to LlamaForCausalLMEagle3

63beaf4

- Use yapf to automatically format code according to project standards - Improve code consistency and readability

FIx

9c53591

Merge branch 'vllm-project:main' into enable-llama-eagle3-support

1ca6684

baonudesifeizhai requested review from DarkLight1337, ywang96, benchislett, luccafong, simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad and hmellor as code owners September 8, 2025 04:16

houseroad reviewed Sep 9, 2025

View reviewed changes

baonudesifeizhai and others added 3 commits September 9, 2025 19:53

Merge branch 'main' into enable-llama-eagle3-support

8b53ef8

Fix is sorted

8cd6986

Fix yapf

03beb77

baonudesifeizhai closed this Sep 10, 2025

baonudesifeizhai reopened this Sep 10, 2025

baonudesifeizhai added 7 commits September 10, 2025 00:04

Add debug logging to LlamaForCausalLMEagle3

7b15de3

- Add debug prints to understand fc layer dimensions - Help diagnose matrix dimension mismatch issue

Debug

f3198eb

Clean up LlamaForCausalLMEagle3 class

193654d

- Remove unused _setup_model_specific_config method - Remove debug print statements - Keep only essential combine_hidden_states override for dimension mismatch handling - Simplify the class to focus on core functionality

Fix device mismatch in combine_hidden_states

6420770

- Ensure dynamically created fc layer is on the same device as hidden_states - Add .to(device) to move the new Linear layer to GPU - Resolves RuntimeError: Expected all tensors to be on the same device

Apply yapf formatting to llama_eagle3.py

097f1b1

mergify bot added the needs-rebase label Sep 10, 2025

baonudesifeizhai added 3 commits September 10, 2025 00:30

Add debug prints to identify which class is being used

b60b75a

- Add debug prints in Eagle3LlamaForCausalLM and LlamaForCausalLMEagle3 constructors - This will help identify if our custom class is being instantiated - Needed to debug why matrix dimension mismatch still occurs

Merge main branch to resolve conflicts

a60eae4

- Adopt main branch's solution for LlamaForCausalLMEagle3 mapping - LlamaForCausalLMEagle3 now maps to Eagle3LlamaForCausalLM class - Re-enable EAGLE3 tests as fixed in main branch - Remove TODO comments that are no longer needed

Remove debug prints after main branch merge

b36f66d

- Remove debug prints from Eagle3LlamaForCausalLM and LlamaForCausalLMEagle3 - Main branch has already fixed the EAGLE3 issues - Clean up code for production use

mergify bot removed the needs-rebase label Sep 10, 2025

baonudesifeizhai added 6 commits September 10, 2025 00:49

Fix quant_cfg handling when quant_algo is None

00a3ea6

- Add else branch to set quant_cfg = None when quant_algo is None - This prevents quant_cfg from being undefined in ModelOpt quantization handling - Fixes issue with EAGLE3 models from NVIDIA that don't specify quant_algo

Add debug prints to diagnose matrix dimension mismatch

f6ee486

- Add debug prints in combine_hidden_states to show input/output shapes - This will help identify why fc layer dimensions don't match - Need to understand target_hidden_size vs actual input dimensions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable gptoss eagle3 support#24325 #24410

Enable gptoss eagle3 support#24325 #24410

baonudesifeizhai commented Sep 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

houseroad left a comment

Uh oh!

baonudesifeizhai commented Sep 9, 2025

Uh oh!

frank-wei commented Sep 9, 2025

Uh oh!

zixi-qi commented Sep 9, 2025

Uh oh!

baonudesifeizhai commented Sep 10, 2025

Uh oh!

mergify bot commented Sep 10, 2025

Uh oh!

baonudesifeizhai commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

Enable gptoss eagle3 support#24325 #24410

Are you sure you want to change the base?

Enable gptoss eagle3 support#24325 #24410

Conversation

baonudesifeizhai commented Sep 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

baonudesifeizhai commented Sep 9, 2025

Uh oh!

frank-wei commented Sep 9, 2025

Uh oh!

zixi-qi commented Sep 9, 2025

Uh oh!

baonudesifeizhai commented Sep 10, 2025

Uh oh!

mergify bot commented Sep 10, 2025

Uh oh!

baonudesifeizhai commented Sep 10, 2025

Uh oh!

Uh oh!

baonudesifeizhai commented Sep 8, 2025 •

edited by github-actions bot

Loading