-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
[Core] Support model loader plugins #21067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: 22quinn <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a plugin system for model loaders by creating a ModelLoaderRegistry
, which is a great step towards making vLLM more extensible. The implementation correctly replaces the static LoadFormat
enum with a dynamic, string-based registry.
My review has identified a critical security vulnerability related to dynamic module loading, a high-severity bug in an error-handling path, and a high-severity issue in the new tests concerning global state modification that could lead to flaky tests. Addressing these points will significantly improve the robustness and security of this new feature.
Instead of removing Also, for the CLI could you make it so that if the type hint is a |
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
@hmellor Thanks for the tips! I did an extensive refactoring and now it works the same way as quantization registration. Somehow the docs build keeps failing - it now requires adding all dependencies in If no better workaround, I'm thinking to remove this check in |
Signed-off-by: 22quinn <[email protected]>
@hmellor I fixed mkdocs too, mind taking another a look? |
Sorry for the slow reply! I have had a lot of notifications since I came back from holiday and I must've missed this one. I'll try to find time for this next week! |
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice PR!
I'll approve from my perspective, but will ask for a review from another maintainer to make sure.
vllm/engine/arg_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a follow up PR with my suggestion to parse Union[str, Literal[...]
into add_argument(..., metavar=",".join(get_args(literal)
|
||
|
||
def get_supported_load_formats() -> set[str]: | ||
return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect this to be on the critical path, but since _LOAD_FORMAT_TO_MODEL_LOADER
is a dict we could juist do load_format in _LOAD_FORMAT_TO_MODEL_LOADER
to check if it's supported, which will be faster than constructing a new set
from the keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah or just use an enum/class to group Loader functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They use a dict so that it can have more loaders dynamically added (the same as for quantization methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, I feel this function is not necessary as it's only used in get_model_loader
. I've removed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one!
I think the LoadFormat
could use a tiny wrapper to group a couple of functionalities imo (add and check).
Docs are missing.
PS is this really about registering like so https://docs.vllm.ai/en/latest/design/plugin_system.html?
LoadFormats = Literal[ | ||
"auto", | ||
"bitsandbytes", | ||
"dummy", | ||
"fastsafetensors", | ||
"gguf", | ||
"mistral", | ||
"npcache", | ||
"pt", | ||
"runai_streamer", | ||
"runai_streamer_sharded", | ||
"safetensors", | ||
"sharded_state", | ||
"tensorizer", | ||
] | ||
_LOAD_FORMAT_TO_MODEL_LOADER: dict[str, type[BaseModelLoader]] = { | ||
"auto": DefaultModelLoader, | ||
"bitsandbytes": BitsAndBytesModelLoader, | ||
"dummy": DummyModelLoader, | ||
"fastsafetensors": DefaultModelLoader, | ||
"gguf": GGUFModelLoader, | ||
"mistral": DefaultModelLoader, | ||
"npcache": DefaultModelLoader, | ||
"pt": DefaultModelLoader, | ||
"runai_streamer": RunaiModelStreamerLoader, | ||
"runai_streamer_sharded": ShardedStateLoader, | ||
"safetensors": DefaultModelLoader, | ||
"sharded_state": ShardedStateLoader, | ||
"tensorizer": TensorizerLoader, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could've remained an enums and would've supported a to_model_loader
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I do prefer using a Literal
as it makes for nicer type hinting.
The way that @22quinn has organised the typing and the registry is exactly the same as for quantization methods. If we change one we should probably change both? Maybe as a follow up task to improve the way we handle built-in plugins in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I don't have a strong opinion for this, but agree we'd better be consistent everywhere. I'm leaving it as Literal
for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's keep them consistent for now
|
||
|
||
def get_supported_load_formats() -> set[str]: | ||
return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah or just use an enum/class to group Loader functionalities.
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Yes, it's about registering as an external plugin. For the docstring, it's in |
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
In RLHF use cases, internal customized trainer and checkpoint may be used. In order to support these customization, we also need to inject custom vLLM model loader.
This PR makes it more extensible to allow plugin registration for model loaders.
Example:
Test Plan
Test Result
Unit test passed
(Optional) Documentation Update