Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

This PR migrates GKDTrainer and GKDConfig from trl.trainer to trl.experimental.gkd as part of the TRL V1 refactoring effort.

Resolves #4462
Related to #4374 (Road to v1)
Related to #4223 (Experimental trainers RFC)

Changes Made

Module Structure

  • ✅ Created trl/experimental/gkd/ module with __init__.py, gkd_config.py, and gkd_trainer.py
  • ✅ Moved GKDTrainer (450 lines) and GKDConfig (113 lines) to experimental location
  • ✅ Updated imports in experimental version (relative path adjustments from .. to ...)
  • ✅ Removed TRL_EXPERIMENTAL_SILENCE warning (no longer needed in experimental module)

Backward Compatibility

Tests & Examples

  • ✅ Updated tests/test_gkd_trainer.py to import from trl.experimental.gkd
  • ✅ Updated examples/scripts/gkd.py to import from experimental location

Documentation

  • ✅ Updated docs/source/gkd_trainer.md with new import examples
  • ✅ Moved GKD from Trainers to Experimental section in docs/source/_toctree.yml
  • ✅ Updated docs/source/reducing_memory_usage.md import example
  • ✅ Updated docs/source/liger_kernel_integration.md import example

Migration Path

Before (deprecated, will be removed in TRL 0.29):

from trl import GKDConfig, GKDTrainer

After (recommended):

from trl.experimental.gkd import GKDConfig, GKDTrainer

Testing

  • ✅ All existing tests updated to use new import path
  • ✅ Backward compatibility maintained through deprecation stubs
  • ✅ Example scripts updated and verified

Checklist

  • Module structure created in trl/experimental/gkd/
  • Deprecation stubs added in original locations
  • Tests updated
  • Examples updated
  • Documentation updated
  • Backward compatibility maintained
  • Follows established migration pattern (BCO, CPO, OnlineDPO)

Resolves #4462

- Move GKDTrainer and GKDConfig to trl.experimental.gkd
- Add deprecation warnings in original locations (removal in TRL 0.29)
- Update tests and examples to use new import path
- Update documentation with migration guidance
- Move GKD from Trainers to Experimental section in docs
- Maintain backward compatibility until TRL 0.29
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@qgallouedec qgallouedec merged commit b7918c0 into main Nov 13, 2025
5 of 13 checks passed
@qgallouedec qgallouedec deleted the refactor/move-gkd-to-experimental branch November 13, 2025 04:35
qgallouedec added a commit that referenced this pull request Nov 21, 2025
commit 52ed4df
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 20 21:41:23 2025 +0000

    Fix style OpenEnv example

commit a263946
Author: Sergio Paniego Blanco <[email protected]>
Date:   Thu Nov 20 14:44:15 2025 +0100

    Update OpenEnv guide with latest details (#4552)

    Co-authored-by: burtenshaw <[email protected]>

commit 1a9ff52
Author: Kashif Rasul <[email protected]>
Date:   Wed Nov 19 15:34:25 2025 +0100

    [OpenEnv] browsergym example script (#4539)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 6cbcd94
Author: Sergio Paniego Blanco <[email protected]>
Date:   Wed Nov 19 14:39:44 2025 +0100

    Update OpenEnv example scripts (#4547)

commit 8510589
Author: Sergio Paniego Blanco <[email protected]>
Date:   Wed Nov 19 14:39:20 2025 +0100

    Add OpenEnv Script examples to docs (#4533)

commit e622196
Author: Quentin Gallouédec <[email protected]>
Date:   Mon Nov 17 03:12:30 2025 -0700

    [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524)

commit 1b1242c
Author: Kashif Rasul <[email protected]>
Date:   Fri Nov 14 20:51:41 2025 +0100

    [OpenEnv] add vllm colocate mode to openenv scripts (#4510)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit f39d18a
Author: Fabio Milentiansen Sim <[email protected]>
Date:   Fri Nov 14 23:39:02 2025 +0700

    fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526)

commit d45eaab
Author: Sergio Paniego Blanco <[email protected]>
Date:   Fri Nov 14 12:12:09 2025 +0100

    Add vLLM quantization option for colocate (#4496)

    Co-authored-by: Kashif Rasul <[email protected]>

commit a91d4b3
Author: Sergio Paniego Blanco <[email protected]>
Date:   Fri Nov 14 02:19:08 2025 +0100

    Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 121318e
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 13 17:13:16 2025 -0800

    docs: Extend CLI basic usage examples to all supported CLIs (#4425)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 7918320
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 13:20:52 2025 -0700

    Remove test trainer args (#4517)

commit 102dc41
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 12:36:43 2025 -0700

    Rename `flash-attn` to `flash-attn2` (#4514)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 5de62b0
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 12:05:48 2025 -0700

    Add step time metric to GRPO Trainer for performance tracking (#4516)

    Co-authored-by: lewtun <[email protected]>

commit f1e6377
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 13 11:01:19 2025 -0800

    Move PPOTrainer to trl.experimental.ppo (#4482)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 01f497e
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 13 10:14:58 2025 -0800

    Move NashMDTrainer to experimental module (#4477)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit b6c838a
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 16:53:26 2025 +0000

    `aws-general-8-plus` runner for Docker build

commit ed5c7bb
Author: YangKai0616 <[email protected]>
Date:   Fri Nov 14 00:42:48 2025 +0800

    [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500)

commit ded9bc6
Author: lewtun <[email protected]>
Date:   Thu Nov 13 17:33:59 2025 +0100

    Fix Docker images for Liger (#4522)

commit fd04760
Author: Pramodith Ballapuram <[email protected]>
Date:   Thu Nov 13 11:31:10 2025 +0000

    Paper Index: Change `num_completions` to `num_generations` (#4515)

commit b7918c0
Author: Behrooz Azarkhalili <[email protected]>
Date:   Wed Nov 12 20:35:44 2025 -0800

    Move GKDTrainer to experimental module (#4474)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 07b5011
Author: Tamoghno Kandar <[email protected]>
Date:   Wed Nov 12 20:07:33 2025 -0800

    Replace flash attention2 with kernels-community/flash-attn2 (#4426)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 7a57fd4
Author: Yuxian Gu <[email protected]>
Date:   Thu Nov 13 11:16:20 2025 +0800

    MiniLLM: Fix arguments in config & add to documentation index (#4518)

commit a145eaf
Author: Behrooz Azarkhalili <[email protected]>
Date:   Wed Nov 12 16:35:46 2025 -0800

    refactor: Move CPOTrainer to experimental module (#4470)

commit d2dc717
Author: Taha Yassine <[email protected]>
Date:   Thu Nov 13 00:56:47 2025 +0100

    Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 799b39b
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Nov 12 16:21:05 2025 -0700

    `device_map` and `dtype` to `"auto"` by default (#4509)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit a6a2beb
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Nov 12 09:42:31 2025 -0700

    Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513)

commit 346701a
Author: lewtun <[email protected]>
Date:   Wed Nov 12 17:42:18 2025 +0100

    Replace accelerate logging with stdlib in CLI (#4512)

commit 4db63af
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Nov 12 02:19:51 2025 +0000

    Fix GRPO unsqueeze advantages

commit ecb2811
Author: Yuxian Gu <[email protected]>
Date:   Wed Nov 12 10:17:22 2025 +0800

    Add MiniLLM Trainer (#4504)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 89e4688
Author: Taha Yassine <[email protected]>
Date:   Tue Nov 11 20:36:23 2025 +0100

    Add support for images inside tables with Trackio completions logging (#4505)

commit 2d3279c
Author: lewtun <[email protected]>
Date:   Tue Nov 11 19:22:25 2025 +0100

    Tweak description for vLLM sleep mode (#4506)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 02a3477
Author: Luke Hinds <[email protected]>
Date:   Mon Nov 10 16:41:51 2025 +0000

    Fix link to OpenEnv docs (#4502)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit aaed6c1
Author: Quentin Gallouédec <[email protected]>
Date:   Sat Nov 8 08:20:48 2025 -0700

    Consistency regarding relative imports (#4498)

commit 20760ba
Author: burtenshaw <[email protected]>
Date:   Fri Nov 7 10:50:50 2025 +0100

    [DOCS] update and fix openenv (#4490)

    Co-authored-by: Kashif Rasul <[email protected]>
    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 64cfca4
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 6 22:47:04 2025 -0800

    Move judges to experimental submodule (#4439)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 97ca1a2
Author: Pramodith Ballapuram <[email protected]>
Date:   Fri Nov 7 00:20:15 2025 +0000

    Fix bugs in CISPO conditions (#4499)

commit ffb3dd5
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 6 16:03:00 2025 -0800

    docs: Add PEFT subsection to reducing memory usage guide (#4430)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 43b6541
Author: SolarWindRider <[email protected]>
Date:   Fri Nov 7 06:55:34 2025 +0800

    Support completion bootstrap for VLM in GRPO/RLOO (#4452)

    Co-authored-by: Albert Villanova del Moral <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 642b721
Author: Pramodith Ballapuram <[email protected]>
Date:   Thu Nov 6 22:33:00 2025 +0000

    ScaleRL: Add CISPO Loss (#4495)

commit 32e9c9f
Author: Ishita Bhattacharyya <[email protected]>
Date:   Fri Nov 7 03:37:43 2025 +0530

    ⛴️ Add kernels to Docker images (#4445)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 1bcfc50
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 6 13:40:12 2025 -0800

    Move XPOTrainer to trl.experimental.xpo (#4485)

    Co-authored-by: Invidia19 <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 37942bc
Author: Pramodith Ballapuram <[email protected]>
Date:   Thu Nov 6 21:32:03 2025 +0000

    Buffer samples based on group level stds. (#4492)

commit 66cd02a
Author: Albert Villanova del Moral <[email protected]>
Date:   Thu Nov 6 20:58:25 2025 +0100

    Add tiny model Qwen3VLForConditionalGeneration to CI (#4494)

commit 32febb4
Author: Sergio Paniego Blanco <[email protected]>
Date:   Thu Nov 6 18:21:56 2025 +0100

    Add LFM2 to SFT notebook examples (#4455)
qgallouedec added a commit that referenced this pull request Nov 24, 2025
commit 4cb1a25
Author: Kashif Rasul <[email protected]>
Date:   Sat Nov 22 23:31:29 2025 +0100

    [SFT] Log mean token accuracy from Liger kernel (#4302)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 468b9d4
Author: Susant <[email protected]>
Date:   Sun Nov 23 03:40:32 2025 +0530

    docs: add KTO (2402.01306) to Paper Index + link ref to KTOTrainer (#4440)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 9bc6206
Author: Behrooz Azarkhalili <[email protected]>
Date:   Fri Nov 21 17:34:50 2025 -0800

    Move PRMTrainer to trl.experimental.prm (#4483)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit f7ac974
Author: Sergio Paniego Blanco <[email protected]>
Date:   Fri Nov 21 16:01:04 2025 +0100

    Update OpenEnv guide with new notebook (#4555)

commit c0de042
Author: Sergio Paniego Blanco <[email protected]>
Date:   Fri Nov 21 15:40:25 2025 +0100

    Add GRPO Wordle OpenEnv Colab (#4542)

commit 9f8ef40
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 20 22:36:31 2025 -0800

    [ORPO] Move ORPOTrainer to experimental (#4480)

commit 3bb5d76
Author: Jen Wei <[email protected]>
Date:   Thu Nov 20 18:53:10 2025 -0700

    fix+docs: `device_map=None` for DeepSpeed and add ZeRO paper (1910.02054) to Paper Index (#4551)

commit 375b3eb
Author: Jonny Li <[email protected]>
Date:   Thu Nov 20 19:42:45 2025 -0500

    Add target_parameters to LoraConfig (#4536)

commit 237900d
Author: Kristian Schwethelm <[email protected]>
Date:   Thu Nov 20 23:03:20 2025 +0100

    Fix bug with VLM processors in prompt-completion completion text-only training (#4553)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 52ed4df
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 20 21:41:23 2025 +0000

    Fix style OpenEnv example

commit a263946
Author: Sergio Paniego Blanco <[email protected]>
Date:   Thu Nov 20 14:44:15 2025 +0100

    Update OpenEnv guide with latest details (#4552)

    Co-authored-by: burtenshaw <[email protected]>

commit 1a9ff52
Author: Kashif Rasul <[email protected]>
Date:   Wed Nov 19 15:34:25 2025 +0100

    [OpenEnv] browsergym example script (#4539)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 6cbcd94
Author: Sergio Paniego Blanco <[email protected]>
Date:   Wed Nov 19 14:39:44 2025 +0100

    Update OpenEnv example scripts (#4547)

commit 8510589
Author: Sergio Paniego Blanco <[email protected]>
Date:   Wed Nov 19 14:39:20 2025 +0100

    Add OpenEnv Script examples to docs (#4533)

commit e622196
Author: Quentin Gallouédec <[email protected]>
Date:   Mon Nov 17 03:12:30 2025 -0700

    [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524)

commit 1b1242c
Author: Kashif Rasul <[email protected]>
Date:   Fri Nov 14 20:51:41 2025 +0100

    [OpenEnv] add vllm colocate mode to openenv scripts (#4510)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit f39d18a
Author: Fabio Milentiansen Sim <[email protected]>
Date:   Fri Nov 14 23:39:02 2025 +0700

    fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526)

commit d45eaab
Author: Sergio Paniego Blanco <[email protected]>
Date:   Fri Nov 14 12:12:09 2025 +0100

    Add vLLM quantization option for colocate (#4496)

    Co-authored-by: Kashif Rasul <[email protected]>

commit a91d4b3
Author: Sergio Paniego Blanco <[email protected]>
Date:   Fri Nov 14 02:19:08 2025 +0100

    Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 121318e
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 13 17:13:16 2025 -0800

    docs: Extend CLI basic usage examples to all supported CLIs (#4425)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 7918320
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 13:20:52 2025 -0700

    Remove test trainer args (#4517)

commit 102dc41
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 12:36:43 2025 -0700

    Rename `flash-attn` to `flash-attn2` (#4514)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 5de62b0
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 12:05:48 2025 -0700

    Add step time metric to GRPO Trainer for performance tracking (#4516)

    Co-authored-by: lewtun <[email protected]>

commit f1e6377
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 13 11:01:19 2025 -0800

    Move PPOTrainer to trl.experimental.ppo (#4482)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 01f497e
Author: Behrooz Azarkhalili <[email protected]>
Date:   Thu Nov 13 10:14:58 2025 -0800

    Move NashMDTrainer to experimental module (#4477)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit b6c838a
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Nov 13 16:53:26 2025 +0000

    `aws-general-8-plus` runner for Docker build

commit ed5c7bb
Author: YangKai0616 <[email protected]>
Date:   Fri Nov 14 00:42:48 2025 +0800

    [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500)

commit ded9bc6
Author: lewtun <[email protected]>
Date:   Thu Nov 13 17:33:59 2025 +0100

    Fix Docker images for Liger (#4522)

commit fd04760
Author: Pramodith Ballapuram <[email protected]>
Date:   Thu Nov 13 11:31:10 2025 +0000

    Paper Index: Change `num_completions` to `num_generations` (#4515)

commit b7918c0
Author: Behrooz Azarkhalili <[email protected]>
Date:   Wed Nov 12 20:35:44 2025 -0800

    Move GKDTrainer to experimental module (#4474)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 07b5011
Author: Tamoghno Kandar <[email protected]>
Date:   Wed Nov 12 20:07:33 2025 -0800

    Replace flash attention2 with kernels-community/flash-attn2 (#4426)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 7a57fd4
Author: Yuxian Gu <[email protected]>
Date:   Thu Nov 13 11:16:20 2025 +0800

    MiniLLM: Fix arguments in config & add to documentation index (#4518)

commit a145eaf
Author: Behrooz Azarkhalili <[email protected]>
Date:   Wed Nov 12 16:35:46 2025 -0800

    refactor: Move CPOTrainer to experimental module (#4470)

commit d2dc717
Author: Taha Yassine <[email protected]>
Date:   Thu Nov 13 00:56:47 2025 +0100

    Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 799b39b
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Nov 12 16:21:05 2025 -0700

    `device_map` and `dtype` to `"auto"` by default (#4509)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit a6a2beb
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Nov 12 09:42:31 2025 -0700

    Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513)

commit 346701a
Author: lewtun <[email protected]>
Date:   Wed Nov 12 17:42:18 2025 +0100

    Replace accelerate logging with stdlib in CLI (#4512)

commit 4db63af
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Nov 12 02:19:51 2025 +0000

    Fix GRPO unsqueeze advantages

commit ecb2811
Author: Yuxian Gu <[email protected]>
Date:   Wed Nov 12 10:17:22 2025 +0800

    Add MiniLLM Trainer (#4504)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 89e4688
Author: Taha Yassine <[email protected]>
Date:   Tue Nov 11 20:36:23 2025 +0100

    Add support for images inside tables with Trackio completions logging (#4505)

commit 2d3279c
Author: lewtun <[email protected]>
Date:   Tue Nov 11 19:22:25 2025 +0100

    Tweak description for vLLM sleep mode (#4506)

    Co-authored-by: Quentin Gallouédec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move GKDTrainer to trl.experimental

4 participants