[Core] Support model loader plugins #21067

22quinn · 2025-07-16T17:14:58Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

In RLHF use cases, internal customized trainer and checkpoint may be used. In order to support these customization, we also need to inject custom vLLM model loader.
This PR makes it more extensible to allow plugin registration for model loaders.

Example:

from vllm.config import LoadConfig
from vllm.model_executor.model_loader import get_model_loader, register_model_loader
from vllm.model_executor.model_loader.base_loader import BaseModelLoader

@register_model_loader("my_loader")
class MyModelLoader(BaseModelLoader):
     def download_model(self):
         pass

     def load_weights(self):
         pass

load_config = LoadConfig(load_format="my_loader")
type(get_model_loader(load_config))

<class 'MyModelLoader'>

Test Plan

pytest tests/model_executor/model_loader/test_registry.py

Test Result

Unit test passed

(Optional) Documentation Update

Signed-off-by: 22quinn <[email protected]>

github-actions · 2025-07-16T17:15:05Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a plugin system for model loaders by creating a ModelLoaderRegistry, which is a great step towards making vLLM more extensible. The implementation correctly replaces the static LoadFormat enum with a dynamic, string-based registry.

My review has identified a critical security vulnerability related to dynamic module loading, a high-severity bug in an error-handling path, and a high-severity issue in the new tests concerning global state modification that could lead to flaky tests. Addressing these points will significantly improve the robustness and security of this new feature.

vllm/model_executor/model_loader/registry.py

tests/model_executor/model_loader/test_registry.py

vllm/model_executor/model_loader/registry.py

hmellor · 2025-07-16T17:22:43Z

Instead of removing LoadFormat, could you create a type Literal and the loader dictionary similar to how is done for QuantizationMethods?

Also, for the CLI could you make it so that if the type hint is a Union[str, Literal[...]] then the contents of the Literal go in metavar? That way the user gets a good hint at the options but isn't blocked from passing arbitrary strings for out of tree plugins.

Signed-off-by: 22quinn <[email protected]>

22quinn · 2025-07-17T05:34:47Z

@hmellor Thanks for the tips! I did an extensive refactoring and now it works the same way as quantization registration.

Somehow the docs build keeps failing - it now requires adding all dependencies in model_loader.__init__.py into requirements/docs.txt. We can perhaps add all of them but any better way to handle this?

If no better workaround, I'm thinking to remove this check in __post_init__: if self.load_format not in model_loader.get_supported_load_formats(). It will be checked in get_model_loader anyway.

Signed-off-by: 22quinn <[email protected]>

22quinn · 2025-07-18T17:37:20Z

@hmellor I fixed mkdocs too, mind taking another a look?
CI passed and the failing tests looks irrelevant, might need a force merge if everything looks good

hmellor · 2025-07-18T18:23:51Z

Sorry for the slow reply! I have had a lot of notifications since I came back from holiday and I must've missed this one.

I'll try to find time for this next week!

mergify · 2025-07-23T08:48:24Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @22quinn.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor

This is a really nice PR!

I'll approve from my perspective, but will ask for a review from another maintainer to make sure.

hmellor · 2025-07-23T09:39:05Z

vllm/engine/arg_utils.py

I'll make a follow up PR with my suggestion to parse Union[str, Literal[...] into add_argument(..., metavar=",".join(get_args(literal)

hmellor · 2025-07-23T09:44:29Z

vllm/model_executor/model_loader/__init__.py

+
+
+def get_supported_load_formats() -> set[str]:
+    return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys())


I don't expect this to be on the critical path, but since _LOAD_FORMAT_TO_MODEL_LOADER is a dict we could juist do load_format in _LOAD_FORMAT_TO_MODEL_LOADER to check if it's supported, which will be faster than constructing a new set from the keys

yeah or just use an enum/class to group Loader functionalities.

They use a dict so that it can have more loaders dynamically added (the same as for quantization methods)

Thinking again, I feel this function is not necessary as it's only used in get_model_loader. I've removed it

NickLucche

Nice one!

I think the LoadFormat could use a tiny wrapper to group a couple of functionalities imo (add and check).
Docs are missing.

PS is this really about registering like so https://docs.vllm.ai/en/latest/design/plugin_system.html?

NickLucche · 2025-07-23T10:15:10Z

vllm/model_executor/model_loader/__init__.py

+LoadFormats = Literal[
+    "auto",
+    "bitsandbytes",
+    "dummy",
+    "fastsafetensors",
+    "gguf",
+    "mistral",
+    "npcache",
+    "pt",
+    "runai_streamer",
+    "runai_streamer_sharded",
+    "safetensors",
+    "sharded_state",
+    "tensorizer",
+]
+_LOAD_FORMAT_TO_MODEL_LOADER: dict[str, type[BaseModelLoader]] = {
+    "auto": DefaultModelLoader,
+    "bitsandbytes": BitsAndBytesModelLoader,
+    "dummy": DummyModelLoader,
+    "fastsafetensors": DefaultModelLoader,
+    "gguf": GGUFModelLoader,
+    "mistral": DefaultModelLoader,
+    "npcache": DefaultModelLoader,
+    "pt": DefaultModelLoader,
+    "runai_streamer": RunaiModelStreamerLoader,
+    "runai_streamer_sharded": ShardedStateLoader,
+    "safetensors": DefaultModelLoader,
+    "sharded_state": ShardedStateLoader,
+    "tensorizer": TensorizerLoader,
+}


this could've remained an enums and would've supported a to_model_loader method.

Personally, I do prefer using a Literal as it makes for nicer type hinting.

The way that @22quinn has organised the typing and the registry is exactly the same as for quantization methods. If we change one we should probably change both? Maybe as a follow up task to improve the way we handle built-in plugins in general?

Thanks for the review! I don't have a strong opinion for this, but agree we'd better be consistent everywhere. I'm leaving it as Literal for now

Yeah let's keep them consistent for now

NickLucche · 2025-07-23T10:18:18Z

vllm/model_executor/model_loader/__init__.py

+
+
+def get_supported_load_formats() -> set[str]:
+    return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys())


yeah or just use an enum/class to group Loader functionalities.

Signed-off-by: 22quinn <[email protected]>

22quinn · 2025-07-24T03:45:33Z

Nice one!

I think the LoadFormat could use a tiny wrapper to group a couple of functionalities imo (add and check). Docs are missing.

PS is this really about registering like so https://docs.vllm.ai/en/latest/design/plugin_system.html?

Yes, it's about registering as an external plugin. For the docstring, it's in LoadConfig, similar to QuantizationMethod. I've added a reminder to update docstring if a new load format is added.

model loader plugin

5f0ac47

Signed-off-by: 22quinn <[email protected]>

22quinn requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad and hmellor as code owners July 16, 2025 17:14

gemini-code-assist bot reviewed Jul 16, 2025

View reviewed changes

vllm/model_executor/model_loader/registry.py Outdated Show resolved Hide resolved

tests/model_executor/model_loader/test_registry.py Outdated Show resolved Hide resolved

vllm/model_executor/model_loader/registry.py Outdated Show resolved Hide resolved

22quinn added 7 commits July 16, 2025 19:10

Merge branch 'main' into loader-plugin

7d05267

Signed-off-by: 22quinn <[email protected]>

isolate unit tests

7a74a1a

Signed-off-by: 22quinn <[email protected]>

test invalid model loader

1379e3e

Signed-off-by: 22quinn <[email protected]>

hmellor comment

7660c9d

Signed-off-by: 22quinn <[email protected]>

fix example

ea1cd1c

Signed-off-by: 22quinn <[email protected]>

fix mypi, try fix mkdocs

35f7354

Signed-off-by: 22quinn <[email protected]>

fix mkdocs

97bd40d

Signed-off-by: 22quinn <[email protected]>

mergify bot added the ci/build label Jul 17, 2025

fix mkdocs

479d912

Signed-off-by: 22quinn <[email protected]>

22quinn added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 17, 2025

mergify bot added the needs-rebase label Jul 23, 2025

hmellor approved these changes Jul 23, 2025

View reviewed changes

NickLucche suggested changes Jul 23, 2025

View reviewed changes

rebase

a81d13d

Signed-off-by: 22quinn <[email protected]>

remove get_supported_load_formats, add docstring reminder

37cd547

Signed-off-by: 22quinn <[email protected]>

mergify bot removed the needs-rebase label Jul 24, 2025

DarkLight1337 approved these changes Jul 24, 2025

View reviewed changes

vllm-bot merged commit 610852a into vllm-project:main Jul 24, 2025
69 of 70 checks passed



		def get_supported_load_formats() -> set[str]:
		return set(_LOAD_FORMAT_TO_MODEL_LOADER.keys())

Uh oh!

[Core] Support model loader plugins #21067

[Core] Support model loader plugins #21067

Conversation

22quinn commented Jul 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hmellor commented Jul 16, 2025

Uh oh!

22quinn commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

22quinn commented Jul 18, 2025

Uh oh!

hmellor commented Jul 18, 2025

Uh oh!

mergify bot commented Jul 23, 2025

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

22quinn commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

22quinn commented Jul 16, 2025 •

edited by github-actions bot

Loading

22quinn commented Jul 17, 2025 •

edited

Loading

22quinn commented Jul 24, 2025 •

edited

Loading