-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
total_loss = (1 - self.kd_ratio) * lm_loss + self.kd_ratio * distil_loss
这里lm_loss和distil_loss在数量级上面差了近百倍,千倍,这样直接融合是否有意义?实际数据看lm_loss刚开始都是几十,最后收敛也到了0.1量级,但distil_loss是最开始也是0.001量级,收敛到0.0001量级,这样加权distill_loss基本没效果
Metadata
Metadata
Assignees
Labels
No labels