Skip to content

Conversation

jscslld
Copy link

@jscslld jscslld commented Sep 1, 2025

Purpose

This PR implements support for the Eagle 2.5 VL model by NVIDIA.

Test Plan

pytest tests/models/multimodal/generation/test_common.py -k eagle2_5_vl

Test Result

/usr/local/lib/python3.10/dist-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================================================================================== test session starts ===============================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/guoc/workspace/clean_data/ld_data/vllm_release/vllm
configfile: pyproject.toml
plugins: xdist-3.7.0, dash-3.2.0, timeout-2.4.0, anyio-3.7.1, order-1.3.0, rerunfailures-15.1, hypothesis-6.135.14, hydra-core-1.3.2, asyncio-1.0.0, rich-0.2.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 328 items / 325 deselected / 3 selected                                                                                                                                                                                                

../../../../../../../../../../../../../***/vllm/tests/models/multimodal/generation/test_common.py ...                                                                                      [100%]

================================================================================================================ warnings summary ================================================================================================================
tests/models/multimodal/generation/test_common.py::test_single_image_models[eagle2_5_vl-test_case123]
  ***/vllm/tests/models/multimodal/generation/vlm_utils/core.py:153: UserWarning: Test0:
  hf:   'STOP<|im_end|>'
  vllm: 'STOP'
    comparator(

tests/models/multimodal/generation/test_common.py::test_single_image_models[eagle2_5_vl-test_case123]
  ***/vllm/tests/models/multimodal/generation/vlm_utils/core.py:153: UserWarning: Test0:
  hf:   'The season is spring, as indicated by the cherry blossoms in full bloom.<|im_end|>'
  vllm: 'The season is spring, as indicated by the cherry blossoms in full bloom.'
    comparator(

tests/models/multimodal/generation/test_common.py::test_multi_image_models[eagle2_5_vl-test_case97]
  ***/vllm/tests/models/multimodal/generation/vlm_utils/core.py:153: UserWarning: Test0:
  Matched tokens:       [1906, 12, 16, 25, 576, 2168, 61891, 264, 8592, 6109, 448, 264, 20469, 2518, 323, 6623, 8453, 5325, 3117, 11, 3881, 438, 264, 7106, 333, 524, 11, 16445, 8606, 8453, 42463, 5424, 323, 5766, 13, 758, 279, 39305, 11, 264, 2518, 2936, 1841, 374, 21810, 389, 264, 25026, 11, 323, 264, 3691, 1803, 374, 9842, 3267, 279, 5325, 3117, 13, 576, 8592, 374, 31293, 448, 5257, 8353, 62237, 11, 2670, 264, 3553, 448, 264, 1841, 429, 15804, 330, 38021, 2034, 1189, 2619, 525, 1083, 4158, 39032, 57902, 1320, 32334, 279, 5325, 3117, 11, 323, 279, 3082, 7952, 311, 387, 264, 89156, 15662, 4573]
  hf:   'Image-1: The image depicts a street scene with a prominent red and gold Chinese archway, known as a paifang, featuring traditional Chinese architectural elements and characters. In the foreground, a red stop sign is mounted on a pole, and a black car is driving past the archway. The street is lined with various commercial establishments, including a store with a sign that reads "OPTUS." There are also white lion statues flanking the archway, and the area appears to be a bustling urban environment.\n\nImage-2: The image captures a vibrant scene of cherry blossoms in full bloom, with delicate pink flowers dominating the' {382: -0.8799123764038086, 448: -0.8799123764038086, 13: -1.8799123764038086, 11: -4.754912376403809, 2337: -4.879912376403809}
  vllm: 'Image-1: The image depicts a street scene with a prominent red and gold Chinese archway, known as a paifang, featuring traditional Chinese architectural elements and characters. In the foreground, a red stop sign is mounted on a pole, and a black car is driving past the archway. The street is lined with various commercial establishments, including a store with a sign that reads "OPTUS." There are also white lion statues flanking the archway, and the area appears to be a bustling urban environment with pedestrians and vehicles.\n\nImage-2: The image captures a picturesque scene of cherry blossoms in full bloom, with delicate'   {448: Logprob(logprob=-0.7803637981414795, rank=1, decoded_token=' with'), 382: Logprob(logprob=-1.0303637981414795, rank=2, decoded_token='.\n\n'), 13: Logprob(logprob=-1.7803637981414795, rank=3, decoded_token='.'), 2337: Logprob(logprob=-4.905364036560059, rank=4, decoded_token=' during'), 11: Logprob(logprob=-4.905364036560059, rank=5, decoded_token=',')}
    comparator(

tests/models/multimodal/generation/test_common.py::test_video_models[eagle2_5_vl-video-test_case28]
  ***/vllm/tests/models/multimodal/generation/vlm_utils/core.py:153: UserWarning: Test0:
  Matched tokens:       [1986, 2766, 374, 15173, 1576, 432, 4933, 264, 8770, 12233, 61205, 28147, 1393, 4460, 311, 1349, 264, 2311, 13, 576, 12872, 1948, 279, 13673, 8770, 323, 279, 3460, 28147, 11450, 264, 469, 938, 9124, 13, 576, 8770, 594, 54249, 4774, 311, 1349, 11, 10856, 448, 279, 31577, 487, 315, 279, 6534, 11, 11367, 311, 279, 27385, 13, 1084, 594, 264, 34409, 323, 326, 1090, 1782, 471, 291]
  hf:   "This video is funny because it shows a baby wearing oversized glasses while trying to read a book. The contrast between the tiny baby and the large glasses creates a comical visual. The baby's earnest attempt to read, combined with the absurdity of the situation, adds to the humor. It's a charming and lighthearted moment that captures the innocence and curiosity of a young child in an amusing scenario.<|im_end|>" {6109: -0.7680288553237915, 4445: -0.7680288553237915, 72664: -3.768028736114502, 73933: -4.393028736114502, 1896: -4.393028736114502}
  vllm: "This video is funny because it shows a baby wearing oversized glasses while trying to read a book. The contrast between the tiny baby and the large glasses creates a comical visual. The baby's earnest attempt to read, combined with the absurdity of the situation, adds to the humor. It's a charming and lighthearted scene that captures the innocence and curiosity of a young child in an amusing scenario."    {6109: Logprob(logprob=-0.7076097130775452, rank=1, decoded_token=' scene'), 4445: Logprob(logprob=-0.8326097130775452, rank=2, decoded_token=' moment'), 72664: Logprob(logprob=-3.8326096534729004, rank=3, decoded_token=' depiction'), 73933: Logprob(logprob=-4.3326096534729, rank=4, decoded_token=' portrayal'), 1896: Logprob(logprob=-4.3326096534729, rank=5, decoded_token=' take')}
    comparator(

tests/models/multimodal/generation/test_common.py::test_video_models[eagle2_5_vl-video-test_case28]
  ***/vllm/tests/models/multimodal/generation/vlm_utils/core.py:153: UserWarning: Test1:
  Matched tokens:       [785, 2766, 374, 15173, 1576, 432, 4933, 264, 8770, 12233, 28147, 1393, 5290, 264, 2311, 13, 1096, 374, 59886, 1576, 23920, 11136, 1513, 944, 9850, 28147, 11, 323, 279, 13929, 315, 264, 2613, 1682, 448, 28147, 4460, 311, 1349, 11450, 264, 69846, 12872, 13, 576, 8770, 594, 6001, 93015, 323, 10735, 6168, 912, 311, 279]
  hf:   "The video is funny because it shows a baby wearing glasses while reading a book. This is amusing because babies typically don't wear glasses, and the sight of a small child with glasses trying to read creates a humorous contrast. The baby's serious demeanor and focused actions add to the comedy, as it's unexpected to see such a young child engaged in what appears to be a grown-up activity. The combination of the baby's appearance and behavior creates a charming and funny scene.<|im_end|>"      {22358: -0.7686885595321655, 94371: -0.7686885595321655, 469: -3.518688678741455, 27385: -3.768688678741455, 31253: -5.518688678741455}
  vllm: "The video is funny because it shows a baby wearing glasses while reading a book. This is amusing because babies typically don't wear glasses, and the sight of a small child with glasses trying to read creates a humorous contrast. The baby's serious demeanor and focused actions add to the comedic effect, as it's unexpected to see such a young child engaged in an activity usually associated with adults."    {94371: Logprob(logprob=-0.7115707993507385, rank=1, decoded_token=' comedic'), 22358: Logprob(logprob=-0.8365707993507385, rank=2, decoded_token=' comedy'), 469: Logprob(logprob=-3.4615707397460938, rank=3, decoded_token=' com'), 27385: Logprob(logprob=-3.7115707397460938, rank=4, decoded_token=' humor'), 9124: Logprob(logprob=-5.586570739746094, rank=5, decoded_token=' visual')}
    comparator(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================================================== 3 passed, 325 deselected, 5 warnings in 159.86s (0:02:39) ============================================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added multi-modality Related to multi-modality (#4194) new-model Requests to new models speculative-decoding labels Sep 1, 2025
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@jscslld
Copy link
Author

jscslld commented Sep 1, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Eagle 2.5 VL model. The changes include the model implementation, registration, and test configurations. The implementation appears to be largely adapted from existing models, but there are several critical copy-paste errors, particularly in handling video embeddings, which could lead to runtime failures. Additionally, there are multiple instances of misleading comments and docstrings that should be corrected for better maintainability. I've provided specific comments and suggestions to address these issues.

Copy link

github-actions bot commented Sep 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@jscslld jscslld requested a review from hmellor as a code owner September 1, 2025 07:52
@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 1, 2025
max_transformers_version="4.48", # noqa: E501
transformers_version_reason="HF model is not compatible.", # noqa: E501
hf_overrides={"architectures": ["DeepseekVLV2ForCausalLM"]}), # noqa: E501
"Eagle2_5_VLForConditionalGeneration": _HfExamplesInfo("nvidia/Eagle2.5-8B"), # noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add processor test in tests/models/multimodal/processing/test_common.py?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of Eagle 2.5VL on HF still appears to have some compatibility issues. We will add the processor test after fixing the code on HF.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eagle 2.5 VL Processor has been included in the test. We wrap the HF version of Eagle 2.5 VL Processor to make it compatible with the vllm test, so that the tests can directly test the HF version of Eagle 2.5 VL Processor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/models/multimodal/processing/test_common.py is used to make sure multimodal processor can work well with mm cache, which is unlikely to be covered by generation test.

Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Lu Lidong <[email protected]>
jscslld and others added 14 commits September 3, 2025 15:14
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Lu Lidong <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Lu Lidong <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Lu Lidong <[email protected]>
…e with vllm tests, so that tests can directly test the HF version of Eagle 2.5 VL Processor
@jscslld
Copy link
Author

jscslld commented Sep 12, 2025

when will this PR be reviewed and merged?

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the dealy! Just leave some minor comments. PTAL!

max_transformers_version="4.48", # noqa: E501
transformers_version_reason="HF model is not compatible.", # noqa: E501
hf_overrides={"architectures": ["DeepseekVLV2ForCausalLM"]}), # noqa: E501
"Eagle2_5_VLForConditionalGeneration": _HfExamplesInfo("nvidia/Eagle2.5-8B"), # noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/models/multimodal/processing/test_common.py is used to make sure multimodal processor can work well with mm cache, which is unlikely to be covered by generation test.

jscslld and others added 2 commits September 15, 2025 19:42
Copy link

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jscslld.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) needs-rebase new-model Requests to new models speculative-decoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants