Skip to content

[Model] add Hunyuan V1 Dense Model support. #21368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 23, 2025

Conversation

kzjeef
Copy link
Contributor

@kzjeef kzjeef commented Jul 22, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

python3 -m vllm.entrypoints.openai.api_server   --model tencent/Hunyuan-7B-Instruct-0124 --trust_remote_code 

Test Result

The server will start normally:

INFO:     Started server process [1497668]                                                                  
INFO:     Waiting for application startup.                                             
INFO:     Application startup complete. 

(Optional) Documentation Update

@mergify mergify bot added the new-model Requests to new models label Jul 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Hunyuan V1 Dense Model by refactoring the existing MoE implementation into a shared module. A base class is introduced, and either MoE or dense MLP blocks are used conditionally. A critical issue was identified in the _is_moe helper function related to type checking, and a code suggestion has been provided to address it.

Signed-off-by: Asher Zhang <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this model inherit from llama, just like phi3 does

@kzjeef
Copy link
Contributor Author

kzjeef commented Jul 22, 2025

Can this model inherit from llama, just like phi3 does

@jeejeelee Thanks for the suggestion, actually we'd like to keep it standalone,
the major reason is we have some different between llama in model file and config handling,
and also because we have some upcoming feature is not upstreaming yet,
standalone file is easier maintain between internal model and public model.

@jeejeelee
Copy link
Collaborator

jeejeelee commented Jul 23, 2025

@DarkLight1337 @Isotr0py Could you please take another look?

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the dense model in vllm/tests/models/registry.py

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update docs/models/supported_models.md with dense model?

kzjeef added 2 commits July 23, 2025 11:26
Signed-off-by: Asher Zhang <[email protected]>
@kzjeef kzjeef requested review from hmellor and ywang96 as code owners July 23, 2025 03:59
@mergify mergify bot added the documentation Improvements or additions to documentation label Jul 23, 2025
@kzjeef
Copy link
Contributor Author

kzjeef commented Jul 23, 2025

Can you also update docs/models/supported_models.md with dense model?

done

@kzjeef
Copy link
Contributor Author

kzjeef commented Jul 23, 2025

Please update the dense model in vllm/tests/models/registry.py

done

@kzjeef kzjeef requested a review from jeejeelee July 23, 2025 06:49
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 23, 2025
@DarkLight1337 DarkLight1337 added this to the v0.10.0 milestone Jul 23, 2025
@kzjeef
Copy link
Contributor Author

kzjeef commented Jul 23, 2025

Two failed case is not related to this patch:

They are all about Ernie4_5_ForCausalLM.

[2025-07-23T10:01:21Z] =================================== FAILURES ===================================

  | [2025-07-23T10:01:21Z] __________________ test_can_initialize[Ernie4_5_ForCausalLM] ___________________
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] model_arch = 'Ernie4_5_ForCausalLM'
  | [2025-07-23T10:01:21Z] monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] @pytest.mark.parametrize("model_arch", HF_EXAMPLE_MODELS.get_supported_archs())
  | [2025-07-23T10:01:21Z] def test_can_initialize(model_arch: str, monkeypatch: pytest.MonkeyPatch):
  | [2025-07-23T10:01:21Z] > can_initialize(model_arch, monkeypatch, HF_EXAMPLE_MODELS)
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] models/test_initialization.py:135:
  | [2025-07-23T10:01:21Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] args = ('Ernie4_5_ForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>)
  | [2025-07-23T10:01:21Z] kwargs = {}, Skipped = <class 'Skipped'>, pid = 4691, pgid = 19, _pid = 4691
  | [2025-07-23T10:01:21Z] _exitcode = 256, old_signal_handler = <Handlers.SIG_DFL: 0>
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] @functools.wraps(f)
  | [2025-07-23T10:01:21Z] def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> None:
  | [2025-07-23T10:01:21Z] # Make the process the leader of its own process group
  | [2025-07-23T10:01:21Z] # to avoid sending SIGTERM to the parent process
  | [2025-07-23T10:01:21Z] os.setpgrp()
  | [2025-07-23T10:01:21Z] from _pytest.outcomes import Skipped
  | [2025-07-23T10:01:21Z] pid = os.fork()
  | [2025-07-23T10:01:21Z] print(f"Fork a new process to run a test {pid}")
  | [2025-07-23T10:01:21Z] if pid == 0:
  | [2025-07-23T10:01:21Z] try:
  | [2025-07-23T10:01:21Z] f(*args, **kwargs)
  | [2025-07-23T10:01:21Z] except Skipped as e:
  | [2025-07-23T10:01:21Z] # convert Skipped to exit code 0
  | [2025-07-23T10:01:21Z] print(str(e))
  | [2025-07-23T10:01:21Z] os._exit(0)
  | [2025-07-23T10:01:21Z] except Exception:
  | [2025-07-23T10:01:21Z] import traceback
  | [2025-07-23T10:01:21Z] traceback.print_exc()
  | [2025-07-23T10:01:21Z] os._exit(1)
  | [2025-07-23T10:01:21Z] else:
  | [2025-07-23T10:01:21Z] os._exit(0)
  | [2025-07-23T10:01:21Z] else:
  | [2025-07-23T10:01:21Z] pgid = os.getpgid(pid)
  | [2025-07-23T10:01:21Z] _pid, _exitcode = os.waitpid(pid, 0)
  | [2025-07-23T10:01:21Z] # ignore SIGTERM signal itself
  | [2025-07-23T10:01:21Z] old_signal_handler = signal.signal(signal.SIGTERM, signal.SIG_IGN)
  | [2025-07-23T10:01:21Z] # kill all child processes
  | [2025-07-23T10:01:21Z] os.killpg(pgid, signal.SIGTERM)
  | [2025-07-23T10:01:21Z] # restore the signal handler
  | [2025-07-23T10:01:21Z] signal.signal(signal.SIGTERM, old_signal_handler)
  | [2025-07-23T10:01:21Z] > assert _exitcode == 0, (f"function {f} failed when called with"
  | [2025-07-23T10:01:21Z] f" args {args} and kwargs {kwargs}")
  | [2025-07-23T10:01:21Z] E AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_ForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] utils.py:761: AssertionError
  | [2025-07-23T10:01:21Z] _________________ test_can_initialize[Ernie4_5_MoeForCausalLM] _________________
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] model_arch = 'Ernie4_5_MoeForCausalLM'
  | [2025-07-23T10:01:21Z] monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] @pytest.mark.parametrize("model_arch", HF_EXAMPLE_MODELS.get_supported_archs())
  | [2025-07-23T10:01:21Z] def test_can_initialize(model_arch: str, monkeypatch: pytest.MonkeyPatch):
  | [2025-07-23T10:01:21Z] > can_initialize(model_arch, monkeypatch, HF_EXAMPLE_MODELS)
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] models/test_initialization.py:135:
  | [2025-07-23T10:01:21Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] args = ('Ernie4_5_MoeForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>)
  | [2025-07-23T10:01:21Z] kwargs = {}, Skipped = <class 'Skipped'>, pid = 4692, pgid = 19, _pid = 4692
  | [2025-07-23T10:01:21Z] _exitcode = 256, old_signal_handler = <Handlers.SIG_DFL: 0>
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] @functools.wraps(f)
  | [2025-07-23T10:01:21Z] def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> None:
  | [2025-07-23T10:01:21Z] # Make the process the leader of its own process group
  | [2025-07-23T10:01:21Z] # to avoid sending SIGTERM to the parent process
  | [2025-07-23T10:01:21Z] os.setpgrp()
  | [2025-07-23T10:01:21Z] from _pytest.outcomes import Skipped
  | [2025-07-23T10:01:21Z] pid = os.fork()
  | [2025-07-23T10:01:21Z] print(f"Fork a new process to run a test {pid}")
  | [2025-07-23T10:01:21Z] if pid == 0:
  | [2025-07-23T10:01:21Z] try:
  | [2025-07-23T10:01:21Z] f(*args, **kwargs)
  | [2025-07-23T10:01:21Z] except Skipped as e:
  | [2025-07-23T10:01:21Z] # convert Skipped to exit code 0
  | [2025-07-23T10:01:21Z] print(str(e))
  | [2025-07-23T10:01:21Z] os._exit(0)
  | [2025-07-23T10:01:21Z] except Exception:
  | [2025-07-23T10:01:21Z] import traceback
  | [2025-07-23T10:01:21Z] traceback.print_exc()
  | [2025-07-23T10:01:21Z] os._exit(1)
  | [2025-07-23T10:01:21Z] else:
  | [2025-07-23T10:01:21Z] os._exit(0)
  | [2025-07-23T10:01:21Z] else:
  | [2025-07-23T10:01:21Z] pgid = os.getpgid(pid)
  | [2025-07-23T10:01:21Z] _pid, _exitcode = os.waitpid(pid, 0)
  | [2025-07-23T10:01:21Z] # ignore SIGTERM signal itself
  | [2025-07-23T10:01:21Z] old_signal_handler = signal.signal(signal.SIGTERM, signal.SIG_IGN)
  | [2025-07-23T10:01:21Z] # kill all child processes
  | [2025-07-23T10:01:21Z] os.killpg(pgid, signal.SIGTERM)
  | [2025-07-23T10:01:21Z] # restore the signal handler
  | [2025-07-23T10:01:21Z] signal.signal(signal.SIGTERM, old_signal_handler)
  | [2025-07-23T10:01:21Z] > assert _exitcode == 0, (f"function {f} failed when called with"
  | [2025-07-23T10:01:21Z] f" args {args} and kwargs {kwargs}")
  | [2025-07-23T10:01:21Z] E AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_MoeForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] utils.py:761: AssertionError
  | [2025-07-23T10:01:21Z] =============================== warnings summary ===============================
  | [2025-07-23T10:01:21Z] ../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  | [2025-07-23T10:01:21Z] /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
  | [2025-07-23T10:01:21Z] ref_error: type[Exception] = jsonschema.RefResolutionError,
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] tests/models/test_initialization.py: 181 warnings
  | [2025-07-23T10:01:21Z] /vllm-workspace/tests/utils.py:737: DeprecationWarning: This process (pid=19) is multi-threaded, use of fork() may lead to deadlocks in the child.
  | [2025-07-23T10:01:21Z] pid = os.fork()
  | [2025-07-23T10:01:21Z]
  | [2025-07-23T10:01:21Z] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
  | [2025-07-23T10:01:21Z] =========================== short test summary info ============================
  | [2025-07-23T10:01:21Z] FAILED models/test_initialization.py::test_can_initialize[Ernie4_5_ForCausalLM] - AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_ForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f6068357440>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
  | [2025-07-23T10:01:21Z] FAILED models/test_initialization.py::test_can_initialize[Ernie4_5_MoeForCausalLM] - AssertionError: function <function can_initialize at 0x7f606836d940> failed when called with args ('Ernie4_5_MoeForCausalLM', <_pytest.monkeypatch.MonkeyPatch object at 0x7f605c69be00>, <tests.models.registry.HfExampleModels object at 0x7f60896f6a20>) and kwargs {}
  | [2025-07-23T10:01:21Z] =========== 2 failed, 179 passed, 182 warnings in 2005.73s (0:33:25) ===========

@vllm-bot vllm-bot merged commit 2671334 into vllm-project:main Jul 23, 2025
65 of 68 checks passed
@DarkLight1337
Copy link
Member

LGTM, thanks for updating!

@kzjeef
Copy link
Contributor Author

kzjeef commented Jul 23, 2025

LGTM, thanks for updating!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants