Skip to content

Add support for stop_words in Ray MBridge deployment#605

Open
athitten wants to merge 2 commits intomainfrom
athitten/support_stop_word
Open

Add support for stop_words in Ray MBridge deployment#605
athitten wants to merge 2 commits intomainfrom
athitten/support_stop_word

Conversation

@athitten
Copy link
Contributor

@athitten athitten commented Feb 18, 2026

Extracts stop_words from incoming request and exposes them in nemo_deploy/llm/megatronllm_deployable_ray.py and nemo_deploy/llm/megatronllm_deployable.py to be passed along to the mcore inference engine. Helps reduce unnecessary token generation(which was the case before where a lot of unnecessary tokens were generated) beyond the stop_words passed in the incoming eval requests hence improving the speed.

Speed improvement with 10% gsm8k eval on llama 3.2 1B:

Before stop_words support in deployment: 10 mins
After stop_words support in deployment: 4 min 37s

Signed-off-by: Abhishree <abhishreetm@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@athitten
Copy link
Contributor Author

/ok to test 7937894

Signed-off-by: Abhishree <abhishreetm@gmail.com>
@athitten
Copy link
Contributor Author

/ok to test 1fedc31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments