You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Does anyone have any details on how DeepSeek distilled R1 into smaller models? In the technical report, they provide just no information except for saying that they used SFT and the dataset of 800k examples used to train R1.
If they just do SFT of Qwen on these examples, it's not distillation. Distillation would be if they used R1 to generate scores and then finetuned Qwen to simulate these scores.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi! Does anyone have any details on how DeepSeek distilled R1 into smaller models? In the technical report, they provide just no information except for saying that they used SFT and the dataset of 800k examples used to train R1.
If they just do SFT of Qwen on these examples, it's not distillation. Distillation would be if they used R1 to generate scores and then finetuned Qwen to simulate these scores.
Who knows something?
Beta Was this translation helpful? Give feedback.
All reactions