This is the official implementation of our paper AP2O-Coder: Human-Inspired Progressive Optimization to Fix LLM Code Errors. Accepted by AAAI'26.
Adaptive Progressive Preference Optimization
- deepspeed 0.17.2
- python 3.11.11
- torch 2.7.0
- trl 0.14.0
- transformers 4.51.3
- vllm 0.9.2
To initiate the preference data self-generation and preference optimization processes, use the following command:
sh pipe-qwen2.5-coder.sh

