-
-
Notifications
You must be signed in to change notification settings - Fork 9k
[Model] EXAONE 4.0 model support #21060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Deepfocused <[email protected]>
… configuration externally accessible. Signed-off-by: Deepfocused <[email protected]>
Signed-off-by: Deepfocused <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the EXAONE 4.0 model, including its architecture implementation, configuration, and necessary registry updates. The changes are well-structured and mostly follow the existing patterns in vLLM.
My review has identified two significant issues in the Exaone4Attention
implementation that would lead to incorrect model behavior. One is a critical bug where rotary embeddings are not applied correctly, and the other is a high-severity bug in the sliding window initialization logic which is the root cause of the first bug. I have provided detailed feedback and code suggestions to address these issues.
Once these issues are resolved, the implementation should be solid. Thank you for your contribution!
if self.sliding_window: | ||
q, k = self.rotary_emb(positions, q, k) | ||
attn_output = self.attn(q, k, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rotary embeddings are only applied if self.sliding_window
is not None
. This is incorrect as rotary embeddings should be applied to queries and keys unconditionally for all layers. The current implementation will lead to incorrect model outputs for layers that do not use sliding window attention, as their queries and keys will not be rotated.
if self.sliding_window: | |
q, k = self.rotary_emb(positions, q, k) | |
attn_output = self.attn(q, k, v) | |
q, k = self.rotary_emb(positions, q, k) | |
attn_output = self.attn(q, k, v) |
This looks like its ready to go . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add this model to the Supported Models page
Signed-off-by: Deepfocused <[email protected]>
Completed. |
Hi, thanks for supporting EXAONE-4.0 on vLLM. |
Could you elaborate? |
The first part ( For the second part ( |
I’ll compare it once more with the original EXAONE4 code, make the revisions, and update. |
@lkm2835 I plan to add exception handling for the 1.2B model.” |
"residual": residual | ||
}) | ||
|
||
hidden_states, _ = self.norm(hidden_states, residual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the EXAONE4 Transformers code, it seems that residual
should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i got it
if self.sliding_window: | ||
q, k = self.rotary_emb(positions, q, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EXAONE-4.0-1.2B config should be considered in this conditional statement.
This model uses rotary embeddings, even though sliding_window
is set to None.
You can refer to this part.
transformers/exaone4_modeling.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i got it
@Deepfocused |
…nsidering EXAONE-4.0-1.2B. Signed-off-by: Deepfocused <[email protected]>
@DarkLight1337 @chriswritescode-dev @lkm2835 @andrew 👨🌾I've completed the code modifications, verified they're working correctly, and recommitted the changes. The last changes are as follows: (1) remove residual from self.norm 😸Thanks a lot!😸 |
@lkm2835 can you confirm whether it is correct now? |
Signed-off-by: woongsik <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
You should use Git to resolve the merge conflict |
Signed-off-by: woongsik <[email protected]>
The row is missing again |
@DarkLight1337 Adding the following to docs/models/supported_models.md causes a merge conflict:
|
Signed-off-by: woongsik <[email protected]>
Signed-off-by: woongsik <[email protected]>
@@ -87,6 +88,7 @@ def _get_hf_token() -> Optional[str]: | |||
"medusa": MedusaConfig, | |||
"eagle": EAGLEConfig, | |||
"exaone": ExaoneConfig, | |||
"exaone4": Exaone4Config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have another question.
Is Exaone4Config
necessary? or can we just use the default Exaone4 HF config?
Signed-off-by: Deepfocused <[email protected]> Signed-off-by: woongsik <[email protected]> Signed-off-by: Himanshu Jaju <[email protected]>
Signed-off-by: Deepfocused <[email protected]> Signed-off-by: woongsik <[email protected]>
@Deepfocused I can't replicate and make it engage in reasoning. Here is my server launch command: Do you have any idea of what I'm doing wrong? I even tried the chat template I use for llama.cpp (and which works for it) where I simply forced the
But still no success. |
Hi @blakkd |
Removing
--> Actually, I do get the reasoning triggering, but the opening <think> tag doesn't show up in the response.
Why would it make it impossible to identify the starting point when we force the token ourselves to mark the starting point? I'm so confused :/ I'm so noob But if ever you feel this is not the place to post, please tell me, I would totally understand and open a discussion instead! |
@blakkd |
Of course :D But I'd rather open a discussion instead, I don't feel I've dove enough to post an issue |
I created an issue #21718 |
oh, I saw this too late, Sorry about that. It seems I didn't explain skip_think in enough detail — sorry for the confusion.
|
@Deepfocused Thanks for taking time to bring the reformulation! So, let's say in the general case where we want to have a multiturn discussion, we should set Nevertheless @hongseok-oh thanks for your initiative taking the reins and posting the issue |
@blakkd |
Thanks! Sorry my functioning artificial brain was stuck because of the stop tokens |
Purpose
EXAONE 4 has finally been released. I promptly integrated it with vLLM. I hope it works smoothly on vllm v0.9.3.
This PR includes the modifications required to support the EXAONE 4 model, its configuration files, and other necessary code changes.
Usage
👀 For tool usage, use hermes too parser
# tool vllm serve ... --enable-auto-tool-choice --tool-call-parser hermes --chat-template chat_template.jinja
🦾 Attention, please for reasoning 🦾
💥 "The reasoning parser for Exaone4 is currently under development. It will be added once completed." 😸Thanks