Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Oct 21, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@mergify mergify bot added the deepseek Related to DeepSeek models label Oct 21, 2025
Signed-off-by: Isotr0py <[email protected]>
@mergify mergify bot added the new-model Requests to new models label Oct 21, 2025
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Copy link

mergify bot commented Oct 21, 2025

Documentation preview: https://vllm--27247.org.readthedocs.build/en/27247/

@mergify mergify bot added the documentation Improvements or additions to documentation label Oct 21, 2025
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@Isotr0py
Copy link
Member Author

Isotr0py commented Oct 21, 2025

The model should work now, but we really need a full clean for commented debug codes from upstream codebase everywhere. 😅

@ywang96
Copy link
Member

ywang96 commented Oct 21, 2025

The model should work now, but we really need a full clean for commented debug codes from upstream codebase everywhere. 😅

I'll clean it up - thanks for your work!

Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@ywang96 ywang96 changed the title [WIP][Model] Upstream Deepseek-OCR model [Model] Upstream Deepseek-OCR model Oct 21, 2025
@ywang96 ywang96 marked this pull request as ready for review October 21, 2025 23:26
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if self.text_config.topk_method == "noaux_tc":
architectures = ["DeepseekV3ForCausalLM"]
elif not self.text_config.use_mla:
architectures = ["DeepseekForCausalLM"]
Copy link
Contributor

@whx-sjtu whx-sjtu Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use DeepseekV2ForCausalLM in non-mla scenario? Because DeepseekV2ForCausalLM also supports non-mla:

if model_config.use_mla:

Currently DeepseekForCausalLM is hard coded to use triton version of fused_experts and fused_topk with all_reduce path of MoE implementation, witch fails to utilize SharedFusedMoE witch supports CustomOp. So if there are reasons that DeepseekForCausalLM must be used, I suggest upgrade it to reuse SharedFusedMoE too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was needed back then when our DSv2 implementation didn't support non-mla (you can see the same code in dsvl2)

if self.text_config.topk_method == "noaux_tc":
architectures = ["DeepseekV3ForCausalLM"]
elif not self.text_config.use_mla:
architectures = ["DeepseekForCausalLM"]
else:
architectures = ["DeepseekV2ForCausalLM"]

but you're right this should be no longer needed.

@Ucas-HaoranWei
Copy link

Great work! I have tested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation new-model Requests to new models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: deepseek-ai/DeepSeek-OCR upstream support

4 participants