[Feat] Add support for Dr.GRPO algorithm. Provide a better format reward function for countdown task.#1
Open
Bonjir wants to merge 3 commits intoJerryWu-code:mainfrom
Open
[Feat] Add support for Dr.GRPO algorithm. Provide a better format reward function for countdown task.#1Bonjir wants to merge 3 commits intoJerryWu-code:mainfrom
Bonjir wants to merge 3 commits intoJerryWu-code:mainfrom