-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support dense MLP
& rope
for deepseek architecture
#12686
Comments
Would you have the reference of a model employing one of these two architectural changes? |
In fact, the pure RoPE without YaRN has been supported in deepseek_v2 and used by deepseek-vl2 models: vllm/vllm/model_executor/models/deepseek_v2.py Lines 249 to 259 in 5d98d56
vllm/vllm/model_executor/models/deepseek_v2.py Lines 303 to 312 in 5d98d56
|
FYI, pure RoPE with MLA support will be done in #12729 |
🚀 The feature, motivation and pitch
Huggingface
Deepseek-V2
model supports for 1) the case whenn_routed_experts
is None (which is fully dense model) and 2) the pure RoPE without YaRN.These two features should be supported for compatibility with Huggingface modeling code.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: