generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Move GKDTrainer to experimental module #4474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Resolves #4462 - Move GKDTrainer and GKDConfig to trl.experimental.gkd - Add deprecation warnings in original locations (removal in TRL 0.29) - Update tests and examples to use new import path - Update documentation with migration guidance - Move GKD from Trainers to Experimental section in docs - Maintain backward compatibility until TRL 0.29
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
- Updated trainer references in index.md, dataset_formats.md, example_overview.md, and gkd_trainer.md - Moved test file to tests/experimental/ directory - Updated test imports from relative to parent directory
qgallouedec
reviewed
Nov 13, 2025
Member
qgallouedec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
qgallouedec
approved these changes
Nov 13, 2025
qgallouedec
added a commit
that referenced
this pull request
Nov 21, 2025
commit 52ed4df Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 20 21:41:23 2025 +0000 Fix style OpenEnv example commit a263946 Author: Sergio Paniego Blanco <[email protected]> Date: Thu Nov 20 14:44:15 2025 +0100 Update OpenEnv guide with latest details (#4552) Co-authored-by: burtenshaw <[email protected]> commit 1a9ff52 Author: Kashif Rasul <[email protected]> Date: Wed Nov 19 15:34:25 2025 +0100 [OpenEnv] browsergym example script (#4539) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 6cbcd94 Author: Sergio Paniego Blanco <[email protected]> Date: Wed Nov 19 14:39:44 2025 +0100 Update OpenEnv example scripts (#4547) commit 8510589 Author: Sergio Paniego Blanco <[email protected]> Date: Wed Nov 19 14:39:20 2025 +0100 Add OpenEnv Script examples to docs (#4533) commit e622196 Author: Quentin Gallouédec <[email protected]> Date: Mon Nov 17 03:12:30 2025 -0700 [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524) commit 1b1242c Author: Kashif Rasul <[email protected]> Date: Fri Nov 14 20:51:41 2025 +0100 [OpenEnv] add vllm colocate mode to openenv scripts (#4510) Co-authored-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit f39d18a Author: Fabio Milentiansen Sim <[email protected]> Date: Fri Nov 14 23:39:02 2025 +0700 fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526) commit d45eaab Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 14 12:12:09 2025 +0100 Add vLLM quantization option for colocate (#4496) Co-authored-by: Kashif Rasul <[email protected]> commit a91d4b3 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 14 02:19:08 2025 +0100 Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457) Co-authored-by: Quentin Gallouédec <[email protected]> commit 121318e Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 17:13:16 2025 -0800 docs: Extend CLI basic usage examples to all supported CLIs (#4425) Co-authored-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 7918320 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 13:20:52 2025 -0700 Remove test trainer args (#4517) commit 102dc41 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 12:36:43 2025 -0700 Rename `flash-attn` to `flash-attn2` (#4514) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 5de62b0 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 12:05:48 2025 -0700 Add step time metric to GRPO Trainer for performance tracking (#4516) Co-authored-by: lewtun <[email protected]> commit f1e6377 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 11:01:19 2025 -0800 Move PPOTrainer to trl.experimental.ppo (#4482) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 01f497e Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 10:14:58 2025 -0800 Move NashMDTrainer to experimental module (#4477) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit b6c838a Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 16:53:26 2025 +0000 `aws-general-8-plus` runner for Docker build commit ed5c7bb Author: YangKai0616 <[email protected]> Date: Fri Nov 14 00:42:48 2025 +0800 [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500) commit ded9bc6 Author: lewtun <[email protected]> Date: Thu Nov 13 17:33:59 2025 +0100 Fix Docker images for Liger (#4522) commit fd04760 Author: Pramodith Ballapuram <[email protected]> Date: Thu Nov 13 11:31:10 2025 +0000 Paper Index: Change `num_completions` to `num_generations` (#4515) commit b7918c0 Author: Behrooz Azarkhalili <[email protected]> Date: Wed Nov 12 20:35:44 2025 -0800 Move GKDTrainer to experimental module (#4474) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 07b5011 Author: Tamoghno Kandar <[email protected]> Date: Wed Nov 12 20:07:33 2025 -0800 Replace flash attention2 with kernels-community/flash-attn2 (#4426) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 7a57fd4 Author: Yuxian Gu <[email protected]> Date: Thu Nov 13 11:16:20 2025 +0800 MiniLLM: Fix arguments in config & add to documentation index (#4518) commit a145eaf Author: Behrooz Azarkhalili <[email protected]> Date: Wed Nov 12 16:35:46 2025 -0800 refactor: Move CPOTrainer to experimental module (#4470) commit d2dc717 Author: Taha Yassine <[email protected]> Date: Thu Nov 13 00:56:47 2025 +0100 Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 799b39b Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 16:21:05 2025 -0700 `device_map` and `dtype` to `"auto"` by default (#4509) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit a6a2beb Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 09:42:31 2025 -0700 Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513) commit 346701a Author: lewtun <[email protected]> Date: Wed Nov 12 17:42:18 2025 +0100 Replace accelerate logging with stdlib in CLI (#4512) commit 4db63af Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 02:19:51 2025 +0000 Fix GRPO unsqueeze advantages commit ecb2811 Author: Yuxian Gu <[email protected]> Date: Wed Nov 12 10:17:22 2025 +0800 Add MiniLLM Trainer (#4504) Co-authored-by: Quentin Gallouédec <[email protected]> commit 89e4688 Author: Taha Yassine <[email protected]> Date: Tue Nov 11 20:36:23 2025 +0100 Add support for images inside tables with Trackio completions logging (#4505) commit 2d3279c Author: lewtun <[email protected]> Date: Tue Nov 11 19:22:25 2025 +0100 Tweak description for vLLM sleep mode (#4506) Co-authored-by: Quentin Gallouédec <[email protected]> commit 02a3477 Author: Luke Hinds <[email protected]> Date: Mon Nov 10 16:41:51 2025 +0000 Fix link to OpenEnv docs (#4502) Co-authored-by: Quentin Gallouédec <[email protected]> commit aaed6c1 Author: Quentin Gallouédec <[email protected]> Date: Sat Nov 8 08:20:48 2025 -0700 Consistency regarding relative imports (#4498) commit 20760ba Author: burtenshaw <[email protected]> Date: Fri Nov 7 10:50:50 2025 +0100 [DOCS] update and fix openenv (#4490) Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 64cfca4 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 6 22:47:04 2025 -0800 Move judges to experimental submodule (#4439) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 97ca1a2 Author: Pramodith Ballapuram <[email protected]> Date: Fri Nov 7 00:20:15 2025 +0000 Fix bugs in CISPO conditions (#4499) commit ffb3dd5 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 6 16:03:00 2025 -0800 docs: Add PEFT subsection to reducing memory usage guide (#4430) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 43b6541 Author: SolarWindRider <[email protected]> Date: Fri Nov 7 06:55:34 2025 +0800 Support completion bootstrap for VLM in GRPO/RLOO (#4452) Co-authored-by: Albert Villanova del Moral <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 642b721 Author: Pramodith Ballapuram <[email protected]> Date: Thu Nov 6 22:33:00 2025 +0000 ScaleRL: Add CISPO Loss (#4495) commit 32e9c9f Author: Ishita Bhattacharyya <[email protected]> Date: Fri Nov 7 03:37:43 2025 +0530 ⛴️ Add kernels to Docker images (#4445) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 1bcfc50 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 6 13:40:12 2025 -0800 Move XPOTrainer to trl.experimental.xpo (#4485) Co-authored-by: Invidia19 <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 37942bc Author: Pramodith Ballapuram <[email protected]> Date: Thu Nov 6 21:32:03 2025 +0000 Buffer samples based on group level stds. (#4492) commit 66cd02a Author: Albert Villanova del Moral <[email protected]> Date: Thu Nov 6 20:58:25 2025 +0100 Add tiny model Qwen3VLForConditionalGeneration to CI (#4494) commit 32febb4 Author: Sergio Paniego Blanco <[email protected]> Date: Thu Nov 6 18:21:56 2025 +0100 Add LFM2 to SFT notebook examples (#4455)
qgallouedec
added a commit
that referenced
this pull request
Nov 24, 2025
commit 4cb1a25 Author: Kashif Rasul <[email protected]> Date: Sat Nov 22 23:31:29 2025 +0100 [SFT] Log mean token accuracy from Liger kernel (#4302) Co-authored-by: Quentin Gallouédec <[email protected]> commit 468b9d4 Author: Susant <[email protected]> Date: Sun Nov 23 03:40:32 2025 +0530 docs: add KTO (2402.01306) to Paper Index + link ref to KTOTrainer (#4440) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 9bc6206 Author: Behrooz Azarkhalili <[email protected]> Date: Fri Nov 21 17:34:50 2025 -0800 Move PRMTrainer to trl.experimental.prm (#4483) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit f7ac974 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 21 16:01:04 2025 +0100 Update OpenEnv guide with new notebook (#4555) commit c0de042 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 21 15:40:25 2025 +0100 Add GRPO Wordle OpenEnv Colab (#4542) commit 9f8ef40 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 20 22:36:31 2025 -0800 [ORPO] Move ORPOTrainer to experimental (#4480) commit 3bb5d76 Author: Jen Wei <[email protected]> Date: Thu Nov 20 18:53:10 2025 -0700 fix+docs: `device_map=None` for DeepSpeed and add ZeRO paper (1910.02054) to Paper Index (#4551) commit 375b3eb Author: Jonny Li <[email protected]> Date: Thu Nov 20 19:42:45 2025 -0500 Add target_parameters to LoraConfig (#4536) commit 237900d Author: Kristian Schwethelm <[email protected]> Date: Thu Nov 20 23:03:20 2025 +0100 Fix bug with VLM processors in prompt-completion completion text-only training (#4553) Co-authored-by: Quentin Gallouédec <[email protected]> commit 52ed4df Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 20 21:41:23 2025 +0000 Fix style OpenEnv example commit a263946 Author: Sergio Paniego Blanco <[email protected]> Date: Thu Nov 20 14:44:15 2025 +0100 Update OpenEnv guide with latest details (#4552) Co-authored-by: burtenshaw <[email protected]> commit 1a9ff52 Author: Kashif Rasul <[email protected]> Date: Wed Nov 19 15:34:25 2025 +0100 [OpenEnv] browsergym example script (#4539) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 6cbcd94 Author: Sergio Paniego Blanco <[email protected]> Date: Wed Nov 19 14:39:44 2025 +0100 Update OpenEnv example scripts (#4547) commit 8510589 Author: Sergio Paniego Blanco <[email protected]> Date: Wed Nov 19 14:39:20 2025 +0100 Add OpenEnv Script examples to docs (#4533) commit e622196 Author: Quentin Gallouédec <[email protected]> Date: Mon Nov 17 03:12:30 2025 -0700 [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524) commit 1b1242c Author: Kashif Rasul <[email protected]> Date: Fri Nov 14 20:51:41 2025 +0100 [OpenEnv] add vllm colocate mode to openenv scripts (#4510) Co-authored-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit f39d18a Author: Fabio Milentiansen Sim <[email protected]> Date: Fri Nov 14 23:39:02 2025 +0700 fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526) commit d45eaab Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 14 12:12:09 2025 +0100 Add vLLM quantization option for colocate (#4496) Co-authored-by: Kashif Rasul <[email protected]> commit a91d4b3 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 14 02:19:08 2025 +0100 Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457) Co-authored-by: Quentin Gallouédec <[email protected]> commit 121318e Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 17:13:16 2025 -0800 docs: Extend CLI basic usage examples to all supported CLIs (#4425) Co-authored-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 7918320 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 13:20:52 2025 -0700 Remove test trainer args (#4517) commit 102dc41 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 12:36:43 2025 -0700 Rename `flash-attn` to `flash-attn2` (#4514) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 5de62b0 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 12:05:48 2025 -0700 Add step time metric to GRPO Trainer for performance tracking (#4516) Co-authored-by: lewtun <[email protected]> commit f1e6377 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 11:01:19 2025 -0800 Move PPOTrainer to trl.experimental.ppo (#4482) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 01f497e Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 10:14:58 2025 -0800 Move NashMDTrainer to experimental module (#4477) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit b6c838a Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 16:53:26 2025 +0000 `aws-general-8-plus` runner for Docker build commit ed5c7bb Author: YangKai0616 <[email protected]> Date: Fri Nov 14 00:42:48 2025 +0800 [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500) commit ded9bc6 Author: lewtun <[email protected]> Date: Thu Nov 13 17:33:59 2025 +0100 Fix Docker images for Liger (#4522) commit fd04760 Author: Pramodith Ballapuram <[email protected]> Date: Thu Nov 13 11:31:10 2025 +0000 Paper Index: Change `num_completions` to `num_generations` (#4515) commit b7918c0 Author: Behrooz Azarkhalili <[email protected]> Date: Wed Nov 12 20:35:44 2025 -0800 Move GKDTrainer to experimental module (#4474) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 07b5011 Author: Tamoghno Kandar <[email protected]> Date: Wed Nov 12 20:07:33 2025 -0800 Replace flash attention2 with kernels-community/flash-attn2 (#4426) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 7a57fd4 Author: Yuxian Gu <[email protected]> Date: Thu Nov 13 11:16:20 2025 +0800 MiniLLM: Fix arguments in config & add to documentation index (#4518) commit a145eaf Author: Behrooz Azarkhalili <[email protected]> Date: Wed Nov 12 16:35:46 2025 -0800 refactor: Move CPOTrainer to experimental module (#4470) commit d2dc717 Author: Taha Yassine <[email protected]> Date: Thu Nov 13 00:56:47 2025 +0100 Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 799b39b Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 16:21:05 2025 -0700 `device_map` and `dtype` to `"auto"` by default (#4509) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit a6a2beb Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 09:42:31 2025 -0700 Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513) commit 346701a Author: lewtun <[email protected]> Date: Wed Nov 12 17:42:18 2025 +0100 Replace accelerate logging with stdlib in CLI (#4512) commit 4db63af Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 02:19:51 2025 +0000 Fix GRPO unsqueeze advantages commit ecb2811 Author: Yuxian Gu <[email protected]> Date: Wed Nov 12 10:17:22 2025 +0800 Add MiniLLM Trainer (#4504) Co-authored-by: Quentin Gallouédec <[email protected]> commit 89e4688 Author: Taha Yassine <[email protected]> Date: Tue Nov 11 20:36:23 2025 +0100 Add support for images inside tables with Trackio completions logging (#4505) commit 2d3279c Author: lewtun <[email protected]> Date: Tue Nov 11 19:22:25 2025 +0100 Tweak description for vLLM sleep mode (#4506) Co-authored-by: Quentin Gallouédec <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR migrates
GKDTrainerandGKDConfigfromtrl.trainertotrl.experimental.gkdas part of the TRL V1 refactoring effort.Resolves #4462
Related to #4374 (Road to v1)
Related to #4223 (Experimental trainers RFC)
Changes Made
Module Structure
trl/experimental/gkd/module with__init__.py,gkd_config.py, andgkd_trainer.py..to...)Backward Compatibility
__init__and__post_init__methodsTests & Examples
tests/test_gkd_trainer.pyto import fromtrl.experimental.gkdexamples/scripts/gkd.pyto import from experimental locationDocumentation
docs/source/gkd_trainer.mdwith new import examplesdocs/source/_toctree.ymldocs/source/reducing_memory_usage.mdimport exampledocs/source/liger_kernel_integration.mdimport exampleMigration Path
Before (deprecated, will be removed in TRL 0.29):
After (recommended):
Testing
Checklist
trl/experimental/gkd/