The kl divergence collapses but the format reward becomes larger #373

yuki-younai · 2025-02-19T14:44:14Z

I had a problem where the KL divergence suddenly became very large during training and the format reward suddenly went up, causing the training to crash.
#255

yuki-younai · 2025-02-19T14:45:43Z

I used Qwen-2.5-1.5B-Instrcut for grpo training without changing any configuration files。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The kl divergence collapses but the format reward becomes larger #373

The kl divergence collapses but the format reward becomes larger #373

yuki-younai commented Feb 19, 2025

yuki-younai commented Feb 19, 2025

The kl divergence collapses but the format reward becomes larger #373

The kl divergence collapses but the format reward becomes larger #373

Comments

yuki-younai commented Feb 19, 2025

yuki-younai commented Feb 19, 2025