-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote GRPO ref model #2763
base: main
Are you sure you want to change the base?
Remote GRPO ref model #2763
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
if the ref_model exists on another machine (node), I don't fully understand how fsdp can take place ? Just to improve my knowledge I appreciate explanation on how the sync can happen without conflict ? |
This would potentially conflict with PR #2684 though, maybe need a note on doc. |
@shirinyamani Only the model being optimized is sharded in this setting, the ref model is running on another node in order to free memory on the node being used for optimization. @Superskyyy , good point. Yes I do not think that iterative GRPO is compatible with this option. |
@@ -78,6 +78,9 @@ class GRPOConfig(TrainingArguments): | |||
Number of updates steps to accumulate the gradients for, before performing a backward/update pass. | |||
beta (`float`, *optional*, defaults to `0.04`): | |||
KL coefficient. | |||
|
|||
> Parameters that control remote models | |||
ref_model_url: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref_model_url: str | |
ref_model_url: str | |
.... Using a remote ref model isn't compatible with ref model syncing. |
When a more distributed backend is built into the lib it can be solved naturally. |
@Superskyyy
|
@edbeeching Thanks! I'm planning some further decoupling and efficiency gains, once this is merged I will try to add something on top of it this weekend. |
Adds an option to use a remote reference model, hosted on another node. The user can provide the url.
I was originally going to use vllm serve but you would have to decode the ids and encode etc. This seems simpler, but may not be as robust. Speed is not an issue though as it is just a forward pass and not autoregressive generation.
Usage:
Still to do: