Skip to content

Conversation

@ApGa
Copy link
Collaborator

@ApGa ApGa commented Jan 3, 2026

  1. Adds on-policy distillation trainer, addressing Distillation experiments #18.
  2. Fixes a masking issue in the generator (fix by @adityasoni9998).

Example plot of negative reverse KL between Qwen3-4B (student) and Qwen3-8B (teacher).

image

@ApGa ApGa changed the title [WIP] On-policy distillation support feat: Add on-policy distillation support Jan 3, 2026
@ApGa ApGa requested a review from lintangsutawika January 5, 2026 17:43
@ApGa ApGa marked this pull request as ready for review January 5, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants