✨ Add vLLM guided decoding support to GRPO Trainer #2811

kldzj · 2025-02-10T00:09:32Z

What does this PR do?

Adds the ability to pass vLLM's GuidedDecodingParams through to the llm.generate call.

Example:

from trl import GRPOConfig, GRPOTrainer
from vllm.sampling_params import GuidedDecodingParams

training_args = GRPOConfig(
    use_vllm = True,
    vllm_guided_decoding_params = GuidedDecodingParams(
        backend="outlines",
        regex="<reasoning>\n.*\n</reasoning>\n<answer>\n.*\n</answer>",
    ),
    # ...
)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2025-02-10T14:15:24Z

Thanks for contributing @kldzj! For the record, can you explain briefly what is the motivation behind using GuidedDecodingParams?

HuggingFaceDocBuilderDev · 2025-02-10T14:18:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kldzj · 2025-02-10T14:20:53Z

@qgallouedec When using the GRPO trainer, we likely want the model to respond in a specific format, in the example above we enforce the <reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer> format right away, without spending many training steps for the model to learn the correct format through our reward functions.

Let me know if there's any problem or flaw in my logic with this.

qgallouedec · 2025-02-10T16:59:20Z

It's very interesting
Regarding the implementation, it a bit annoying because GuidedDecodingParams isn't json serializable so it causes error. A fair alternative is to only do like this instead

@dataclass
 class GRPOConfig(TrainingArguments):
    ...
    vllm_guided_decoding_regex: Optional[str] = None

and

if args.vllm_guided_decoding_regex is not None:
    guided_decoding = GuidedDecodingParams(backend="outlines", regex= args.vllm_guided_decoding_regex)

it's less flexible but explicitly exposes the regex and probably easier for the user.

kldzj · 2025-02-10T21:14:34Z

@qgallouedec Made the suggested change. :)

* ✨ Add vLLM guided decoding support to GRPO Trainer * 🔧 Update vLLM guided decoding in GRPO to use regex parameter * style and docstring * test --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

✨ Add vLLM guided decoding support to GRPO Trainer

235fe05

Merge branch 'main' into pr/kldzj/2811

012b74b

🔧 Update vLLM guided decoding in GRPO to use regex parameter

1af2894

qgallouedec and others added 4 commits February 13, 2025 14:09

style and docstring

223f866

test

067dfc6

Merge branch 'main' into grpo-guided-decoding

8dc7678

Merge branch 'main' into grpo-guided-decoding

f61c077

qgallouedec approved these changes Feb 18, 2025

View reviewed changes

qgallouedec merged commit 49adf74 into huggingface:main Feb 18, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ Add vLLM guided decoding support to GRPO Trainer #2811

✨ Add vLLM guided decoding support to GRPO Trainer #2811

Uh oh!

kldzj commented Feb 10, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Feb 10, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 10, 2025

Uh oh!

kldzj commented Feb 10, 2025

Uh oh!

qgallouedec commented Feb 10, 2025 •

edited

Loading

Uh oh!

kldzj commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

✨ Add vLLM guided decoding support to GRPO Trainer #2811

✨ Add vLLM guided decoding support to GRPO Trainer #2811

Uh oh!

Conversation

kldzj commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented Feb 10, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 10, 2025

Uh oh!

kldzj commented Feb 10, 2025

Uh oh!

qgallouedec commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kldzj commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

kldzj commented Feb 10, 2025 •

edited

Loading

qgallouedec commented Feb 10, 2025 •

edited

Loading