-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Speculative decoding reports errors when loading target model using distributed inference (VLLM's offical Ray setup) #12841
Comments
Hi @zhuohan123 @youkaichao can I get some help please? This is for distributed deploying DeepSeek for Speculative decoding due to limited resources. The above code example is just for reproducibility using some smaller models. |
cc @LiuXiaoxuanPKU for spec decode. |
Hi @LiuXiaoxuanPKU any chance you can help find the root cause? I also wanted to use the #12915 to deploy DeepSeek with sepculative decoding. But I only got two nodes of 8*H100. And hence need resolve the bug above. The minimal reproducible code is also provided above and it is not just for deepseek model. |
Hi, I encounter the same error when trying to run #12915 with DeepSeek-R1 model. My env:
It's ok to run DeepSeek-R1 model without MTP feature in the same distributed env. My startup command is
The error detail is
Looking forward to any suggestions or guidances, thanks a lot! |
The root cause is these lines: vllm/vllm/model_executor/layers/spec_decode_base_sampler.py Lines 54 to 62 in 84683fa
When we use multi-node inferencing with tp bigger than 8, the device is not transformed correctly from int. You can make it work around by manually changing the code from |
Got it thank you so much! Let me try and experiment. |
Hi, have you replicated the inference acceleration effect after enabling MTP on multiple nodes?I have the same env(2 * 8 * H20) and mtp implemention but got low throughput(about 8.5 in average) and long scoring_time(100+ ms). |
Hello, what speculative-model do you use when using MTP, and how can I get it? |
@yangchou19 "deepseek-ai/DeepSeek-R1". |
Thank you, the command |
Actually, you don't need to set up |
Your current environment
ray status
being okay, and run following python script within the container.VLLM
class.VLLM
class and it reports error.🐛 Describe the bug
Reproducible code is simply below. Note. If you remove the speculative decoding arguments, the model can be loaded successfully.
The error message it gave is following.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: