👨‍👩‍👧 GRPO + PEFT + vLLM #2818

winglian · 2025-02-10T13:55:09Z

What does this PR do?

unlocks PEFT + GRPO + vllm without the complexity of shipping lora weights to vllm via the REST API. This implementation simply merges the lora weights into the base model and ships that to vllm using the existing python API.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

winglian · 2025-02-10T14:21:30Z

trl/trainer/utils.py

+            weight_key = key.replace(base_model_prefix, "") + ".weight"
+            bias_key = key.replace(base_model_prefix, "") + ".bias"


I know this is pretty janky, so would love feedback on making it better.

Does it work to iterate through model.base_model.model.named_modules() at L947 to get the named parameters w/o the "model.base_model" prefix?

winglian · 2025-02-10T14:21:48Z

trl/trainer/utils.py

+                if any(
+                        skip in key
+                        for skip in [
+                            ".original_module",
+                            ".modules_to_save",
+                            ".base_layer",
+                        ]
+                ):
+                    continue


same for this

qgallouedec · 2025-02-10T14:52:11Z

@winglian just to point out a different approach: #2730

winglian · 2025-02-10T15:04:15Z

@qgallouedec The downside there is that you're limited to the lora support in vllm, which means no DoRA support. This approach almost any peft adapter type could be used. While LoRA does converge pretty quickly too compared to full parameter training, dora seems to be more performant.

qgallouedec · 2025-02-10T15:06:04Z

This seems quite reasonable, thank you for the clear explanation.

qgallouedec · 2025-02-10T15:09:47Z

Another pointer that could be useful:

It is possible to call model.merge_adapter (optionally with adapter_names argument), then model.state_dict(), then model.unmerge_adapter.
The state_dict may require some clean up though, depending on what you need to do with it (I couldn't infer that from the PR).
By clean up, I mean: After merge_and_unload the model looks like the base model. But merge_adapter keeps the LoRA structure, with the wrapped base model, LoRA weights etc. still being present in the state_dict.

From @BenjaminBossan

winglian · 2025-02-10T16:05:52Z

I tried

                unwrapped_model.merge_and_unload()
                state_dict = unwrapped_model.base_model.model.state_dict()
                unwrapped_model.unmerge_adapter()

but the state dict results still has the prefix of base_model.model.

BenjaminBossan

Thanks a lot for this PR.

To elaborate on the quote by Quentin, the steps would be:

Call model.merge_adapter().
Get the state_dict of the merged model.
Clean up the state_dict: Since the base weights already contain the merged LoRA weights, we can remove all LoRA weights
Call model.unmerge_adapter() if we need to restore the previous state (note that unmerge_adapter unmerges all adapters, so if some were already merged before step 1, they need to be re-merged, but it's probably not relevant here)

Here is a small demonstration in code:

from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model_id = "meta-llama/Llama-3.2-1B"
model = AutoModelForCausalLM.from_pretrained(model_id)
config = LoraConfig()
model = get_peft_model(model, config)
model.merge_adapter()
sd = model.state_dict()
new_sd = {k.removeprefix("base_model.model.").replace(".base_layer", ""): v for k, v in sd.items() if model.prefix not in k}
model.unmerge_adapter()

qgallouedec · 2025-02-10T18:30:41Z

I've added the suggested modification to this branch: #2725 it seems to work...! EDIT: DORA included

BenjaminBossan · 2025-02-11T11:25:26Z

I've added the suggested modification to this branch: #2725 it seems to work...! EDIT: DORA included

Nice, I added a comment there. Hopefully, one of these branches can be merged soon :)

winglian · 2025-02-13T00:45:53Z

I re-did this PR to account for the other changes, and also updated the test to use lora.

BenjaminBossan

Not sure if this PR is still required after #2725 has been merged, but I did a quick review just in case.

BenjaminBossan · 2025-02-13T13:38:12Z

trl/trainer/grpo_trainer.py

+                    k.removeprefix("base_model.model.")
+                    .removeprefix("base_model.model.")


Duplicate, you can remove the 2nd line.

BenjaminBossan · 2025-02-13T13:38:51Z

trl/trainer/grpo_trainer.py

-                    k.removeprefix("base_model.model.").replace(".base_layer", ""): v
+                    k.removeprefix("base_model.model.")
+                    .removeprefix("base_model.model.")
+                    .replace(".default", "")


I'll leave the same comment as I did on #2725:

Note here that the adapter name can be different from "default". You could get the adapter name from model.active_adapters, which is a list of all active adapters. I assume in this context, there can only ever be one (raise an error when more?), so taking the first item should work.

qgallouedec · 2025-02-13T13:41:38Z

thanks for the followup @BenjaminBossan !

tests/test_grpo_trainer.py

qgallouedec · 2025-02-13T13:49:34Z

trl/trainer/grpo_trainer.py

@@ -249,7 +249,7 @@ def __init__(
        # Reference model
        if is_deepspeed_zero3_enabled():
            self.ref_model = AutoModelForCausalLM.from_pretrained(model_id, **model_init_kwargs)
-        elif peft_config is None:
+        elif not is_peft_model(model):


allows to support model that is already wrapped by peft

HuggingFaceDocBuilderDev · 2025-02-13T13:52:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

mehdiataei · 2025-02-13T17:13:59Z

Using Qwen1.5 instruct model I face the following error:


[rank0]:     trainer.train()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2171, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3669, in training_step
[rank0]:     inputs = self._prepare_inputs(inputs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/trl/trainer/grpo_trainer.py", line 535, in _prepare_inputs
[rank0]:     self._move_model_to_vllm()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/trl/trainer/grpo_trainer.py", line 515, in _move_model_to_vllm
[rank0]:     llm_model.load_weights(state_dict.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 515, in load_weights
[rank0]:     return loader.load_weights(weights)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 235, in load_weights
[rank0]:     autoloaded_weights = set(self._load_module("", self.module, weights))
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 224, in _load_module
[rank0]:     raise ValueError(msg)
[rank0]: ValueError: There is no module or parameter named 'base_model' in Qwen2ForCausalLM

the format is:

    trainer = GRPOTrainer(
        model=model,
        reward_funcs=[format_reward, judge_reward],
        args=training_args,
        train_dataset=dataset,
        peft_config=lora_config,
    )

with the following vllm settings:

    use_vllm=True,                       # Whether to use vLLM for faster generation (default: False)
    vllm_device="cuda:7",                   # Device for vLLM generation (e.g., "cuda:1"); "auto" selects the next available GPU
    vllm_gpu_memory_utilization=0.4,       # Fraction of GPU memory to reserve for vLLM (default: 0.9)
    vllm_dtype="auto",                    # Data type for vLLM generation; "auto" lets vLLM decide based on model config
    vllm_max_model_len=512,              # Optional maximum model length for vLLM; if None, uses the model's context size

Another weird thing that I noticed is that

INFO 02-13 17:12:27 model_runner.py:1115] Loading model weights took 0.0000 GB
^[[AINFO 02-13 17:12:28 worker.py:267] Memory profiling takes 0.48 seconds
INFO 02-13 17:12:28 worker.py:267] the current vLLM instance can use total_gpu_memory (39.39GiB) x gpu_memory_utilization (0.40) = 15.76GiB
INFO 02-13 17:12:28 worker.py:267] model weights take 0.00GiB; non_torch_memory takes 0.00GiB; PyTorch activation peak memory takes 0.00GiB; the rest of the memory reserved for KV Cache is 15.76GiB

Why the model weights take 0.00GiB?

zaddy6 · 2025-02-14T11:28:21Z

I noticed training without LORA leads to better performance, here is an example without LORA it starts to max the rewards at 1k steps, with Lora it doesnt learn

winglian · 2025-02-14T12:01:37Z

I noticed training without LORA leads to better performance, here is an example without LORA it starts to max the rewards at 1k steps, with Lora it doesnt learn

What rank and dataset? It learns pretty quickly with rank 64 o. The gsm8k dataset

zaddy6 · 2025-02-14T14:15:46Z

I noticed training without LORA leads to better performance, here is an example without LORA it starts to max the rewards at 1k steps, with Lora it doesnt learn

What rank and dataset? It learns pretty quickly with rank 64 o. The gsm8k dataset

Current config

lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules="all-linear",
lora_dropout=0.05,
use_dora=True,
)

what do you use as your alpha

wusijie123 · 2025-02-27T03:46:50Z

Total steps changed when using this code

why the Total optimization steps = Num examples * Num Epochs / Gradient Accumulation steps?
in previews version,Total optimization steps = Num examples * Num Epochs / Total train batch size

By the way,even using lora , most of the time I am waiting for VLLM to generate results. Is there any way to speed up the generation？

winglian commented Feb 10, 2025

View reviewed changes

winglian force-pushed the grpo-peft-vllm branch from ead17e3 to 182ade9 Compare February 10, 2025 14:23

BenjaminBossan mentioned this pull request Feb 10, 2025

Dynamically load LoRA weights when using vLLM #2730

Closed

BenjaminBossan reviewed Feb 10, 2025

View reviewed changes

qgallouedec mentioned this pull request Feb 11, 2025

GRPO VLLM does not work with Lora #2698

Closed

5 tasks

peft + grpo + vllm

dd55d46

winglian force-pushed the grpo-peft-vllm branch from a031699 to dd55d46 Compare February 13, 2025 00:44

Merge branch 'main' into grpo-peft-vllm

4220a85

BenjaminBossan reviewed Feb 13, 2025

View reviewed changes

qgallouedec added 2 commits February 13, 2025 13:47

test change

61f1e6b

support model alread peft

4342090

qgallouedec reviewed Feb 13, 2025

View reviewed changes

tests/test_grpo_trainer.py Outdated Show resolved Hide resolved

Update tests/test_grpo_trainer.py

2d2bb27

qgallouedec reviewed Feb 13, 2025

View reviewed changes

qgallouedec approved these changes Feb 13, 2025

View reviewed changes

qgallouedec changed the title ~~GRPO + PEFT + vLLM~~ 👨‍👩‍👧 GRPO + PEFT + vLLM Feb 13, 2025

qgallouedec merged commit 5c9cf20 into huggingface:main Feb 13, 2025

mehdiataei mentioned this pull request Feb 13, 2025

[Bug]:There is no module or parameter named 'base_model' in Qwen2ForCausalLM vllm-project/vllm#12961

Open

1 task

August-murr mentioned this pull request Mar 5, 2025

[GRPO] Anybody else had issue training PEFT+vLLM? #3013

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👨‍👩‍👧 GRPO + PEFT + vLLM #2818

👨‍👩‍👧 GRPO + PEFT + vLLM #2818

winglian commented Feb 10, 2025

winglian Feb 10, 2025

tchang1997 Feb 10, 2025

winglian Feb 10, 2025

qgallouedec commented Feb 10, 2025

winglian commented Feb 10, 2025

qgallouedec commented Feb 10, 2025

qgallouedec commented Feb 10, 2025 •

edited

Loading

winglian commented Feb 10, 2025

BenjaminBossan left a comment

qgallouedec commented Feb 10, 2025 •

edited

Loading

BenjaminBossan commented Feb 11, 2025

winglian commented Feb 13, 2025

BenjaminBossan left a comment

BenjaminBossan Feb 13, 2025

BenjaminBossan Feb 13, 2025

qgallouedec commented Feb 13, 2025

qgallouedec Feb 13, 2025

HuggingFaceDocBuilderDev commented Feb 13, 2025

mehdiataei commented Feb 13, 2025

zaddy6 commented Feb 14, 2025

winglian commented Feb 14, 2025

zaddy6 commented Feb 14, 2025 •

edited

Loading

wusijie123 commented Feb 27, 2025

		weight_key = key.replace(base_model_prefix, "") + ".weight"
		bias_key = key.replace(base_model_prefix, "") + ".bias"

		k.removeprefix("base_model.model.")
		.removeprefix("base_model.model.")

👨‍👩‍👧 GRPO + PEFT + vLLM #2818

👨‍👩‍👧 GRPO + PEFT + vLLM #2818

Conversation

winglian commented Feb 10, 2025

What does this PR do?

Before submitting

Who can review?

winglian Feb 10, 2025

Choose a reason for hiding this comment

tchang1997 Feb 10, 2025

Choose a reason for hiding this comment

winglian Feb 10, 2025

Choose a reason for hiding this comment

qgallouedec commented Feb 10, 2025

winglian commented Feb 10, 2025

qgallouedec commented Feb 10, 2025

qgallouedec commented Feb 10, 2025 • edited Loading

winglian commented Feb 10, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

qgallouedec commented Feb 10, 2025 • edited Loading

BenjaminBossan commented Feb 11, 2025

winglian commented Feb 13, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Feb 13, 2025

Choose a reason for hiding this comment

BenjaminBossan Feb 13, 2025

Choose a reason for hiding this comment

qgallouedec commented Feb 13, 2025

qgallouedec Feb 13, 2025

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 13, 2025

mehdiataei commented Feb 13, 2025

zaddy6 commented Feb 14, 2025

winglian commented Feb 14, 2025

zaddy6 commented Feb 14, 2025 • edited Loading

wusijie123 commented Feb 27, 2025

qgallouedec commented Feb 10, 2025 •

edited

Loading

qgallouedec commented Feb 10, 2025 •

edited

Loading

zaddy6 commented Feb 14, 2025 •

edited

Loading