Dear authors, this is very interesting work! I wonder 1) Did you running the DPO training for several rounds (iterative DPO)? 2) Can you release the script for the DPO training? Thanks!
Dear authors, this is very interesting work! I wonder
Thanks!