[Model] add Hunyuan V1 Dense Model support. #21368

kzjeef · 2025-07-22T11:16:53Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Fix Hunyuan Model Dense model support.
It can be tested with https://huggingface.co/tencent/Hunyuan-7B-Instruct-0124 model.
use a base class for both dense and moe model in single file.

Test Plan

python3 -m vllm.entrypoints.openai.api_server   --model tencent/Hunyuan-7B-Instruct-0124 --trust_remote_code

Test Result

The server will start normally:

INFO:     Started server process [1497668]                                                                  
INFO:     Waiting for application startup.                                             
INFO:     Application startup complete.

(Optional) Documentation Update

Signed-off-by: Asher Zhang <[email protected]>

gemini-code-assist

Code Review

This pull request introduces support for Hunyuan V1 Dense Model by refactoring the existing MoE implementation into a shared module. A base class is introduced, and either MoE or dense MLP blocks are used conditionally. A critical issue was identified in the _is_moe helper function related to type checking, and a code suggestion has been provided to address it.

vllm/model_executor/models/hunyuan_v1.py

Signed-off-by: Asher Zhang <[email protected]>

github-actions · 2025-07-22T11:28:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

jeejeelee

Can this model inherit from llama, just like phi3 does

kzjeef · 2025-07-22T14:41:05Z

Can this model inherit from llama, just like phi3 does

@jeejeelee Thanks for the suggestion, actually we'd like to keep it standalone,
the major reason is we have some different between llama in model file and config handling,
and also because we have some upcoming feature is not upstreaming yet,
standalone file is easier maintain between internal model and public model.

jeejeelee · 2025-07-23T02:47:32Z

@DarkLight1337 @Isotr0py Could you please take another look?

jeejeelee

Please update the dense model in vllm/tests/models/registry.py

Isotr0py

Can you also update docs/models/supported_models.md with dense model?

Signed-off-by: Asher Zhang <[email protected]>

kzjeef · 2025-07-23T06:28:34Z

Can you also update docs/models/supported_models.md with dense model?

done

kzjeef · 2025-07-23T06:49:49Z

Please update the dense model in vllm/tests/models/registry.py

done

kzjeef · 2025-07-23T10:45:06Z

Two failed case is not related to this patch:

They are all about Ernie4_5_ForCausalLM.

[2025-07-23T10:01:21Z] =================================== FAILURES ===================================

| [2025-07-23T10:01:21Z] __________________ test_can_initialize[Ernie4_5_ForCausalLM] ___________________
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] model_arch = 'Ernie4_5_ForCausalLM'
| [2025-07-23T10:01:21Z] monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] @pytest.mark.parametrize("model_arch", HF_EXAMPLE_MODELS.get_supported_archs())
| [2025-07-23T10:01:21Z] def test_can_initialize(model_arch: str, monkeypatch: pytest.MonkeyPatch):
| [2025-07-23T10:01:21Z] > can_initialize(model_arch, monkeypatch, HF_EXAMPLE_MODELS)
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] models/test_initialization.py:135:
| [2025-07-23T10:01:21Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] args = ('Ernie4_5_ForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>)
| [2025-07-23T10:01:21Z] kwargs = {}, Skipped = <class 'Skipped'>, pid = 4691, pgid = 19, _pid = 4691
| [2025-07-23T10:01:21Z] _exitcode = 256, old_signal_handler = <Handlers.SIG_DFL: 0>
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] @functools.wraps(f)
| [2025-07-23T10:01:21Z] def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> None:
| [2025-07-23T10:01:21Z] # Make the process the leader of its own process group
| [2025-07-23T10:01:21Z] # to avoid sending SIGTERM to the parent process
| [2025-07-23T10:01:21Z] os.setpgrp()
| [2025-07-23T10:01:21Z] from _pytest.outcomes import Skipped
| [2025-07-23T10:01:21Z] pid = os.fork()
| [2025-07-23T10:01:21Z] print(f"Fork a new process to run a test {pid}")
| [2025-07-23T10:01:21Z] if pid == 0:
| [2025-07-23T10:01:21Z] try:
| [2025-07-23T10:01:21Z] f(*args, **kwargs)
| [2025-07-23T10:01:21Z] except Skipped as e:
| [2025-07-23T10:01:21Z] # convert Skipped to exit code 0
| [2025-07-23T10:01:21Z] print(str(e))
| [2025-07-23T10:01:21Z] os._exit(0)
| [2025-07-23T10:01:21Z] except Exception:
| [2025-07-23T10:01:21Z] import traceback
| [2025-07-23T10:01:21Z] traceback.print_exc()
| [2025-07-23T10:01:21Z] os._exit(1)
| [2025-07-23T10:01:21Z] else:
| [2025-07-23T10:01:21Z] os._exit(0)
| [2025-07-23T10:01:21Z] else:
| [2025-07-23T10:01:21Z] pgid = os.getpgid(pid)
| [2025-07-23T10:01:21Z] _pid, _exitcode = os.waitpid(pid, 0)
| [2025-07-23T10:01:21Z] # ignore SIGTERM signal itself
| [2025-07-23T10:01:21Z] old_signal_handler = signal.signal(signal.SIGTERM, signal.SIG_IGN)
| [2025-07-23T10:01:21Z] # kill all child processes
| [2025-07-23T10:01:21Z] os.killpg(pgid, signal.SIGTERM)
| [2025-07-23T10:01:21Z] # restore the signal handler
| [2025-07-23T10:01:21Z] signal.signal(signal.SIGTERM, old_signal_handler)
| [2025-07-23T10:01:21Z] > assert _exitcode == 0, (f"function {f} failed when called with"
| [2025-07-23T10:01:21Z] f" args {args} and kwargs {kwargs}")
| [2025-07-23T10:01:21Z] E AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_ForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] utils.py:761: AssertionError
| [2025-07-23T10:01:21Z] _________________ test_can_initialize[Ernie4_5_MoeForCausalLM] _________________
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] model_arch = 'Ernie4_5_MoeForCausalLM'
| [2025-07-23T10:01:21Z] monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] @pytest.mark.parametrize("model_arch", HF_EXAMPLE_MODELS.get_supported_archs())
| [2025-07-23T10:01:21Z] def test_can_initialize(model_arch: str, monkeypatch: pytest.MonkeyPatch):
| [2025-07-23T10:01:21Z] > can_initialize(model_arch, monkeypatch, HF_EXAMPLE_MODELS)
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] models/test_initialization.py:135:
| [2025-07-23T10:01:21Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] args = ('Ernie4_5_MoeForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>)
| [2025-07-23T10:01:21Z] kwargs = {}, Skipped = <class 'Skipped'>, pid = 4692, pgid = 19, _pid = 4692
| [2025-07-23T10:01:21Z] _exitcode = 256, old_signal_handler = <Handlers.SIG_DFL: 0>
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] @functools.wraps(f)
| [2025-07-23T10:01:21Z] def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> None:
| [2025-07-23T10:01:21Z] # Make the process the leader of its own process group
| [2025-07-23T10:01:21Z] # to avoid sending SIGTERM to the parent process
| [2025-07-23T10:01:21Z] os.setpgrp()
| [2025-07-23T10:01:21Z] from _pytest.outcomes import Skipped
| [2025-07-23T10:01:21Z] pid = os.fork()
| [2025-07-23T10:01:21Z] print(f"Fork a new process to run a test {pid}")
| [2025-07-23T10:01:21Z] if pid == 0:
| [2025-07-23T10:01:21Z] try:
| [2025-07-23T10:01:21Z] f(*args, **kwargs)
| [2025-07-23T10:01:21Z] except Skipped as e:
| [2025-07-23T10:01:21Z] # convert Skipped to exit code 0
| [2025-07-23T10:01:21Z] print(str(e))
| [2025-07-23T10:01:21Z] os._exit(0)
| [2025-07-23T10:01:21Z] except Exception:
| [2025-07-23T10:01:21Z] import traceback
| [2025-07-23T10:01:21Z] traceback.print_exc()
| [2025-07-23T10:01:21Z] os._exit(1)
| [2025-07-23T10:01:21Z] else:
| [2025-07-23T10:01:21Z] os._exit(0)
| [2025-07-23T10:01:21Z] else:
| [2025-07-23T10:01:21Z] pgid = os.getpgid(pid)
| [2025-07-23T10:01:21Z] _pid, _exitcode = os.waitpid(pid, 0)
| [2025-07-23T10:01:21Z] # ignore SIGTERM signal itself
| [2025-07-23T10:01:21Z] old_signal_handler = signal.signal(signal.SIGTERM, signal.SIG_IGN)
| [2025-07-23T10:01:21Z] # kill all child processes
| [2025-07-23T10:01:21Z] os.killpg(pgid, signal.SIGTERM)
| [2025-07-23T10:01:21Z] # restore the signal handler
| [2025-07-23T10:01:21Z] signal.signal(signal.SIGTERM, old_signal_handler)
| [2025-07-23T10:01:21Z] > assert _exitcode == 0, (f"function {f} failed when called with"
| [2025-07-23T10:01:21Z] f" args {args} and kwargs {kwargs}")
| [2025-07-23T10:01:21Z] E AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_MoeForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] utils.py:761: AssertionError
| [2025-07-23T10:01:21Z] =============================== warnings summary ===============================
| [2025-07-23T10:01:21Z] ../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
| [2025-07-23T10:01:21Z] /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
| [2025-07-23T10:01:21Z] ref_error: type[Exception] = jsonschema.RefResolutionError,
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] tests/models/test_initialization.py: 181 warnings
| [2025-07-23T10:01:21Z] /vllm-workspace/tests/utils.py:737: DeprecationWarning: This process (pid=19) is multi-threaded, use of fork() may lead to deadlocks in the child.
| [2025-07-23T10:01:21Z] pid = os.fork()
| [2025-07-23T10:01:21Z]
| [2025-07-23T10:01:21Z] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
| [2025-07-23T10:01:21Z] =========================== short test summary info ============================
| [2025-07-23T10:01:21Z] FAILED models/test_initialization.py::test_can_initialize[Ernie4_5_ForCausalLM] - AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_ForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
| [2025-07-23T10:01:21Z] FAILED models/test_initialization.py::test_can_initialize[Ernie4_5_MoeForCausalLM] - AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_MoeForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
| [2025-07-23T10:01:21Z] =========== 2 failed, 179 passed, 182 warnings in 2005.73s (0:33:25) ===========

DarkLight1337 · 2025-07-23T10:54:20Z

LGTM, thanks for updating!

kzjeef · 2025-07-23T10:57:20Z

LGTM, thanks for updating!

Thanks!

[Model] add Hunyuan V1 Dense Model support.

3197e9c

Signed-off-by: Asher Zhang <[email protected]>

mergify bot added the new-model Requests to new models label Jul 22, 2025

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

vllm/model_executor/models/hunyuan_v1.py Outdated Show resolved Hide resolved

Fix gemini warnning.

8219731

Signed-off-by: Asher Zhang <[email protected]>

jeejeelee reviewed Jul 22, 2025

View reviewed changes

jeejeelee requested changes Jul 23, 2025

View reviewed changes

jeejeelee requested review from Isotr0py and DarkLight1337 July 23, 2025 02:52

Isotr0py reviewed Jul 23, 2025

View reviewed changes

kzjeef added 2 commits July 23, 2025 11:26

tests: add model test.

cb1ba0e

Signed-off-by: Asher Zhang <[email protected]>

docs: add hunyuan dense model.

875a009

Signed-off-by: Asher Zhang <[email protected]>

kzjeef requested review from hmellor and ywang96 as code owners July 23, 2025 03:59

mergify bot added the documentation Improvements or additions to documentation label Jul 23, 2025

kzjeef requested a review from jeejeelee July 23, 2025 06:49

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 23, 2025

DarkLight1337 added this to the v0.10.0 milestone Jul 23, 2025

vllm-bot merged commit 2671334 into vllm-project:main Jul 23, 2025
65 of 68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] add Hunyuan V1 Dense Model support. #21368

[Model] add Hunyuan V1 Dense Model support. #21368

kzjeef commented Jul 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

jeejeelee left a comment

Uh oh!

kzjeef commented Jul 22, 2025

Uh oh!

jeejeelee commented Jul 23, 2025 •

edited

Loading

Uh oh!

jeejeelee left a comment

Uh oh!

Isotr0py left a comment

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Jul 23, 2025

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

[Model] add Hunyuan V1 Dense Model support. #21368

[Model] add Hunyuan V1 Dense Model support. #21368

Conversation

kzjeef commented Jul 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

kzjeef commented Jul 22, 2025

Uh oh!

jeejeelee commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

kzjeef commented Jul 23, 2025

[2025-07-23T10:01:21Z] =================================== FAILURES ===================================

Uh oh!

Uh oh!

DarkLight1337 commented Jul 23, 2025

Uh oh!

kzjeef commented Jul 23, 2025

Uh oh!

Uh oh!

kzjeef commented Jul 22, 2025 •

edited by github-actions bot

Loading

jeejeelee commented Jul 23, 2025 •

edited

Loading