🍃 GRPO - Do not load reference model when beta == 0 #2806
+56
−12
Merged
Loading