Potential bug in accumulating KL #4

pradeep-pyro · 2025-01-09T19:19:26Z

Thanks for the paper and releasing this code base!
I had a question on whether this line of code: https://github.com/liziniu/ReMax/blob/master/step3_rlhf_finetuning/remax_trainer.py#L250 should be cumulative_kl += rather than cumulative_kl =. Seems like the KL accumulation is not really happening.

The text was updated successfully, but these errors were encountered:

liziniu · 2025-01-10T01:55:30Z

You are right! Line 250 calculates the one-step KL divergence rather than the cumulative KL. To calculate the cumulative KL, it shoul bd cumulative_kl +=

pradeep-pyro · 2025-02-20T05:19:26Z

Thank you for the clarification.

pradeep-pyro closed this as completed Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in accumulating KL #4

Potential bug in accumulating KL #4

pradeep-pyro commented Jan 9, 2025

liziniu commented Jan 10, 2025

pradeep-pyro commented Feb 20, 2025

Potential bug in accumulating KL #4

Potential bug in accumulating KL #4

Comments

pradeep-pyro commented Jan 9, 2025

liziniu commented Jan 10, 2025

pradeep-pyro commented Feb 20, 2025