Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

This PR migrates RLOOTrainer to the experimental module as part of the TRL V1 refactoring effort.

Changes

  • Create trl.experimental.rloo module with RLOOTrainer and RLOOConfig
  • Add deprecation stubs in trl.trainer with FutureWarning (removal in TRL 0.29.0)
  • Update imports in tests, examples (3 files), and scripts
  • Update documentation:
    • Move RLOO from Trainers to Experimental section in _toctree.yml
    • Add deprecation notice to rloo_trainer.md
    • Update index.md to show experimental.rloo.RLOOTrainer with 🧪 emoji

Testing

  • All existing tests continue to work with deprecation warnings
  • Backward compatibility maintained through deprecation stubs
  • Import paths verified in tests and examples

Contributes to #4374
Fixes #4468

- Create trl.experimental.rloo module with RLOOTrainer and RLOOConfig
- Add deprecation stubs in trl.trainer with FutureWarning (removal in TRL 0.29.0)
- Update imports in tests, examples, and documentation
- Move RLOO to Experimental section in docs/_toctree.yml

Contributes to #4374
Fixes #4468
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- Update dataset_formats.md: RLOOTrainer → experimental.rloo.RLOOTrainer
- Update example_overview.md: RLOOTrainer → experimental.rloo.RLOOTrainer
- Update rloo_trainer.md: all trainer references to experimental path
- Move test file to tests/experimental/test_rloo_trainer.py
- Update test imports to use parent directory reference

Follows pattern from XPO PR #4485
Remove implementation code that was incorrectly merged into the deprecation wrapper. The wrapper should only contain the deprecation warning and delegate to the experimental module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move RLOOTrainer to trl.experimental

3 participants