Why not SFT-cold-start first? #121

dszpr · 2025-02-20T02:37:42Z

Hi! Many appreciate for the great work!
In the training of Deepseek-R1, they first used high-quality COT data for SFT-cold-start in Step-I, and then applied GRPO for training in Step-II.
However, in R1-V project, you directly used GRPO without SFT-cold-start. Why is that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not SFT-cold-start first? #121

Why not SFT-cold-start first? #121

dszpr commented Feb 20, 2025

Why not SFT-cold-start first? #121

Why not SFT-cold-start first? #121

Comments

dszpr commented Feb 20, 2025