Skip to content

白盒蒸馏是loss融合问题 #34

@poryfly

Description

@poryfly

total_loss = (1 - self.kd_ratio) * lm_loss + self.kd_ratio * distil_loss
这里lm_loss和distil_loss在数量级上面差了近百倍,千倍,这样直接融合是否有意义?实际数据看lm_loss刚开始都是几十,最后收敛也到了0.1量级,但distil_loss是最开始也是0.001量级,收敛到0.0001量级,这样加权distill_loss基本没效果

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions