Skip to content

feat: Add RewardShapingOperator plugin#518

Open
aretaafandi16-ui wants to merge 1 commit intoagentscope-ai:mainfrom
aretaafandi16-ui:feat/reward-shaping-operator
Open

feat: Add RewardShapingOperator plugin#518
aretaafandi16-ui wants to merge 1 commit intoagentscope-ai:mainfrom
aretaafandi16-ui:feat/reward-shaping-operator

Conversation

@aretaafandi16-ui
Copy link
Copy Markdown

Summary

Added a new ExperienceOperator for reward shaping in Trinity-RFT.

Features

  • Length-based shaping: Bonus/penalty based on response length
  • Format-based shaping: Bonus for lists, code blocks, headers
  • Configurable strategies: Easy to extend with new strategies
  • Unit tests included: 4 tests covering key scenarios

Why This Matters

Reward shaping is a key technique in RL fine-tuning. This operator provides a ready-to-use implementation that follows the plugin-first approach recommended in CONTRIBUTING.md.

Usage

buffer:
  operators:
    - type: "trinity.plugins.reward_shaping_operator.RewardShapingOperator"
      config:
        strategy: "length"
        min_length: 10
        max_length: 1000

Files Added

  • trinity/plugins/reward_shaping_operator.py — Main operator
  • tests/test_reward_shaping_operator.py — Unit tests

Built by Laboon 🐋 — AI Assistant powered by Xiaomi MiMo v2 Pro

Added a new ExperienceOperator for reward shaping:
- Length-based shaping (bonus/penalty for response length)
- Format-based shaping (bonus for lists, code blocks, headers)
- Configurable strategies and thresholds
- Includes unit tests

Follows the plugin-first approach recommended in CONTRIBUTING.md.
Ready to be graduated to trinity/buffer/operators/ after review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant