-
-
Notifications
You must be signed in to change notification settings - Fork 10k
Add support for Prithvi in Online serving mode #21518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Prithvi in Online serving mode #21518
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces support for Prithvi in online serving mode by allowing models to skip tokenizer initialization. The changes include modifications to the client, serving engine, serving pooling, and the Prithvi model itself. It's important to ensure that the tokenizer is checked for None before being used and that exceptions provide sufficient context for debugging.
model_config=self.model_config, | ||
scheduler_config=engine_config.scheduler_config, | ||
lora_config=engine_config.lora_config) | ||
|
||
self.input_preprocessor = InputPreprocessor(self.model_config, | ||
self.tokenizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If self.tokenizer
is None, this will raise an AttributeError. It's critical to ensure self.tokenizer
is checked for None before being used here to prevent a crash. Consider adding a condition to skip this line if self.tokenizer
is None.
self.input_preprocessor = InputPreprocessor(self.model_config,
self.tokenizer if self.tokenizer else None)
if "prompt_token_ids" not in request.additional_data: | ||
raise Exception("Request must contain " | ||
"additional_data['prompt_token_ids'] " | ||
"when the tokenizer is not initialised") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception lacks context about why the tokenizer is not initialized. Add more details to the exception message to aid debugging. For example, include the model name or a hint to check the --skip-tokenizer-init
flag.
raise Exception("Request must contain "
"additional_data['prompt_token_ids'] "
"when the tokenizer is not initialised. Check if '--skip-tokenizer-init' flag was used correctly for model {}".format(request.model))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also fix pre-commit
Signed-off-by: Michele Gazzetti <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]>
Improve multimodal input handling logic Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Michele Gazzetti <[email protected]>
e5dec1e
to
c2339d1
Compare
Signed-off-by: Michele Gazzetti <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding support!
@DarkLight1337 thank you for the support during the review process. |
PTAL at the failing entrypoints test |
Signed-off-by: Michele Gazzetti <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]>
Head branch was pushed to by a user without write access
Apologies about the test. I will keep an eye on it. |
Signed-off-by: Michele Gazzetti <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: shuw <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Michele Gazzetti <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
This PR builds on top of #20072 and it enables the execution of Prithvi in online serving mode.
This is achieved by increasing the level of support for models that skip the tokenizer initialisation.
A longer description of the what we are trying to achieve is available in #20234.
This supersedes #20307
Test Plan
The PR can be tested as following
Additional information
Prithvi in serving mode can be started with the following command:
The following script provides an example of prompt that can be used to perform an inference (the same is used during the test linked above):
@DarkLight1337 @maxdebayser @njhill @christian-pinto