GitHub - TsingZ0/AP2O: AAAI'26, AP2O-Coder: Human-Inspired Progressive Optimization to Fix LLM Code Errors

Introduction

Adaptive Progressive Preference Optimization

Coding Error Reduction

Data Efficiency

Reward Curves

To initiate the preference data self-generation and preference optimization processes, use the following command:

sh pipe-qwen2.5-coder.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
datasets		datasets
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_dpo_format.py		convert_dpo_format.py
convert_hf_dataset.py		convert_hf_dataset.py
delete_global_step_folders.py		delete_global_step_folders.py
ds_zero-0.json		ds_zero-0.json
ds_zero.json		ds_zero.json
infer.py		infer.py
merge.py		merge.py
pipe-qwen2.5-coder.sh		pipe-qwen2.5-coder.sh
pipelining.py		pipelining.py
train.py		train.py