Skip to content

[Core] Support model loader plugins #21067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 24, 2025
Merged

Conversation

22quinn
Copy link
Collaborator

@22quinn 22quinn commented Jul 16, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

In RLHF use cases, internal customized trainer and checkpoint may be used. In order to support these customization, we also need to inject custom vLLM model loader.
This PR makes it more extensible to allow plugin registration for model loaders.

Example:

from vllm.config import LoadConfig
from vllm.model_executor.model_loader import get_model_loader, register_model_loader
from vllm.model_executor.model_loader.base_loader import BaseModelLoader

@register_model_loader("my_loader")
class MyModelLoader(BaseModelLoader):
     def download_model(self):
         pass

     def load_weights(self):
         pass

load_config = LoadConfig(load_format="my_loader")
type(get_model_loader(load_config))

<class 'MyModelLoader'>

Test Plan

pytest tests/model_executor/model_loader/test_registry.py

Test Result

Unit test passed

(Optional) Documentation Update

Signed-off-by: 22quinn <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a plugin system for model loaders by creating a ModelLoaderRegistry, which is a great step towards making vLLM more extensible. The implementation correctly replaces the static LoadFormat enum with a dynamic, string-based registry.

My review has identified a critical security vulnerability related to dynamic module loading, a high-severity bug in an error-handling path, and a high-severity issue in the new tests concerning global state modification that could lead to flaky tests. Addressing these points will significantly improve the robustness and security of this new feature.

@hmellor
Copy link
Member

hmellor commented Jul 16, 2025

Instead of removing LoadFormat, could you create a type Literal and the loader dictionary similar to how is done for QuantizationMethods?

Also, for the CLI could you make it so that if the type hint is a Union[str, Literal[...]] then the contents of the Literal go in metavar? That way the user gets a good hint at the options but isn't blocked from passing arbitrary strings for out of tree plugins.

22quinn added 7 commits July 16, 2025 19:10
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: 22quinn <[email protected]>
@mergify mergify bot added the ci/build label Jul 17, 2025
@22quinn
Copy link
Collaborator Author

22quinn commented Jul 17, 2025

@hmellor Thanks for the tips! I did an extensive refactoring and now it works the same way as quantization registration.

Somehow the docs build keeps failing - it now requires adding all dependencies in model_loader.__init__.py into requirements/docs.txt. We can perhaps add all of them but any better way to handle this?

If no better workaround, I'm thinking to remove this check in __post_init__: if self.load_format not in model_loader.get_supported_load_formats(). It will be checked in get_model_loader anyway.

Signed-off-by: 22quinn <[email protected]>
@22quinn 22quinn added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 17, 2025
@22quinn
Copy link
Collaborator Author

22quinn commented Jul 18, 2025

@hmellor I fixed mkdocs too, mind taking another a look?
CI passed and the failing tests looks irrelevant, might need a force merge if everything looks good

@hmellor
Copy link
Member

hmellor commented Jul 18, 2025

Sorry for the slow reply! I have had a lot of notifications since I came back from holiday and I must've missed this one.

I'll try to find time for this next week!

Copy link

mergify bot commented Jul 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @22quinn.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 23, 2025
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice PR!

I'll approve from my perspective, but will ask for a review from another maintainer to make sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a follow up PR with my suggestion to parse Union[str, Literal[...] into add_argument(..., metavar=",".join(get_args(literal)



def get_supported_load_formats() -> set[str]:
return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect this to be on the critical path, but since _LOAD_FORMAT_TO_MODEL_LOADER is a dict we could juist do load_format in _LOAD_FORMAT_TO_MODEL_LOADER to check if it's supported, which will be faster than constructing a new set from the keys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah or just use an enum/class to group Loader functionalities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use a dict so that it can have more loaders dynamically added (the same as for quantization methods)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, I feel this function is not necessary as it's only used in get_model_loader. I've removed it

Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

I think the LoadFormat could use a tiny wrapper to group a couple of functionalities imo (add and check).
Docs are missing.

PS is this really about registering like so https://docs.vllm.ai/en/latest/design/plugin_system.html?

Comment on lines +26 to +55
LoadFormats = Literal[
"auto",
"bitsandbytes",
"dummy",
"fastsafetensors",
"gguf",
"mistral",
"npcache",
"pt",
"runai_streamer",
"runai_streamer_sharded",
"safetensors",
"sharded_state",
"tensorizer",
]
_LOAD_FORMAT_TO_MODEL_LOADER: dict[str, type[BaseModelLoader]] = {
"auto": DefaultModelLoader,
"bitsandbytes": BitsAndBytesModelLoader,
"dummy": DummyModelLoader,
"fastsafetensors": DefaultModelLoader,
"gguf": GGUFModelLoader,
"mistral": DefaultModelLoader,
"npcache": DefaultModelLoader,
"pt": DefaultModelLoader,
"runai_streamer": RunaiModelStreamerLoader,
"runai_streamer_sharded": ShardedStateLoader,
"safetensors": DefaultModelLoader,
"sharded_state": ShardedStateLoader,
"tensorizer": TensorizerLoader,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could've remained an enums and would've supported a to_model_loader method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I do prefer using a Literal as it makes for nicer type hinting.

The way that @22quinn has organised the typing and the registry is exactly the same as for quantization methods. If we change one we should probably change both? Maybe as a follow up task to improve the way we handle built-in plugins in general?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I don't have a strong opinion for this, but agree we'd better be consistent everywhere. I'm leaving it as Literal for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's keep them consistent for now



def get_supported_load_formats() -> set[str]:
return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah or just use an enum/class to group Loader functionalities.

Signed-off-by: 22quinn <[email protected]>
@mergify mergify bot removed the needs-rebase label Jul 24, 2025
@22quinn
Copy link
Collaborator Author

22quinn commented Jul 24, 2025

Nice one!

I think the LoadFormat could use a tiny wrapper to group a couple of functionalities imo (add and check). Docs are missing.

PS is this really about registering like so https://docs.vllm.ai/en/latest/design/plugin_system.html?

Yes, it's about registering as an external plugin. For the docstring, it's in LoadConfig, similar to QuantizationMethod. I've added a reminder to update docstring if a new load format is added.

@vllm-bot vllm-bot merged commit 610852a into vllm-project:main Jul 24, 2025
69 of 70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants