You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are right! Line 250 calculates the one-step KL divergence rather than the cumulative KL. To calculate the cumulative KL, it shoul bd cumulative_kl +=
Thanks for the paper and releasing this code base!
I had a question on whether this line of code: https://github.com/liziniu/ReMax/blob/master/step3_rlhf_finetuning/remax_trainer.py#L250 should be
cumulative_kl +=
rather thancumulative_kl =
. Seems like the KL accumulation is not really happening.The text was updated successfully, but these errors were encountered: