feat: Added reward model according to paper. #78

ahmeterdempmk · 2025-01-27T15:18:47Z

This PR implements the reward model training pipeline for DeepSeek-R1, adding key functionality for preference learning and model comparison.

Changes

Addedreward_model.py implementing reward model training pipeline
IntroducedRewardModelScriptArguments class for handling training configuration
Implemented dataset preparation withprepare_comparison_dataset function
Added support for better/worse response comparison training
Integrated with HuggingFace's Trainer for model training
Added model saving and Hub pushing capabilities

Implementation Details

Uses AutoModelForSequenceClassification for reward modeling
Supports configurable dataset columns for comparisons
Implements efficient batch processing with proper tokenization
Handles model checkpointing and evaluation

Testing

The implementation has been tested with:

Dataset preparation and tokenization
Model training pipeline
Hub integration for model sharing

Next Steps

Add evaluation metrics for reward model performance
Implement additional data preprocessing options
Add documentation for training configuration

qgallouedec · 2025-01-27T15:29:43Z

src/open_r1/reward_model.py

+    processed_dataset = prepare_comparison_dataset(dataset, tokenizer, script_args)
+
+    # Initialize trainer
+    trainer = Trainer(


Thanks! Any reason for not using the RewardTrainer?

Because the implementation focuses on direct preference comparison training with better/worse response pairs, which is effectively handled by the standard Trainer with our custom dataset preparation.

Okay, but that's exactly what the RewardTrainer does, isn't it?

qgallouedec · 2025-01-30T15:37:42Z

Can you also explain which part of the paper this addition corresponds to? It seems to me that the idea behind the RL of DeepSeek-R1 is to free it from reward models.

feat: Added reward model according to paper.

67a6a07

qgallouedec reviewed Jan 27, 2025

View reviewed changes

Merge branch 'main' into reward-model

ebadec2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added reward model according to paper. #78

feat: Added reward model according to paper. #78

ahmeterdempmk commented Jan 27, 2025

qgallouedec Jan 27, 2025

ahmeterdempmk Jan 27, 2025

qgallouedec Jan 30, 2025

qgallouedec commented Jan 30, 2025

feat: Added reward model according to paper. #78

Are you sure you want to change the base?

feat: Added reward model according to paper. #78

Conversation

ahmeterdempmk commented Jan 27, 2025

Changes

Implementation Details

Testing

Next Steps

qgallouedec Jan 27, 2025

Choose a reason for hiding this comment

ahmeterdempmk Jan 27, 2025

Choose a reason for hiding this comment

qgallouedec Jan 30, 2025

Choose a reason for hiding this comment

qgallouedec commented Jan 30, 2025