We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练200个step的log如上, completion_length先降低后增加,acc_reward和format_reward会不断增加,但是看readme中的format_reward最后基本为0,completion_length 最后也是降低到20左右,kl的变化跟readme中也对应不上,debug_log_2b.txt也没有出现aha moment,最终的在superclevr上test200_counting上acc是74%,无法对齐82.5%
训练命令如下,8卡A100
cd src/r1-v
export DEBUG_MODE="true" export LOG_PATH="./debug_log_2b.txt"
torchrun --nproc_per_node="8" --nnodes="1" --node_rank="0" --master_addr="127.0.0.1" --master_port="12345" src/open_r1/grpo.py --output_dir /root/code/R1-V/clevr_cogen_a_train --model_name_or_path /modelscope/hub/Qwen/Qwen2-VL-2B-Instruct --dataset_name clevr_cogen_a_train --deepspeed local_scripts/zero3.json --max_prompt_length 512 --max_completion_length 512 --per_device_train_batch_size 1 --gradient_accumulation_steps 2 --logging_steps 1 --bf16 --report_to tensorboard --gradient_checkpointing false --attn_implementation flash_attention_2 --max_pixels 401408 --num_train_epochs 2 --run_name Qwen2-VL-2B-GRPO-CLEVR-70k --save_steps 100 --save_only_model true --num_generations 8
The text was updated successfully, but these errors were encountered:
How about evaluate an earlier checkpoint?
Sorry, something went wrong.
acc 74 是第100个step保存的权重评测结果
No branches or pull requests
训练200个step的log如上,
completion_length先降低后增加,acc_reward和format_reward会不断增加,但是看readme中的format_reward最后基本为0,completion_length
最后也是降低到20左右,kl的变化跟readme中也对应不上,debug_log_2b.txt也没有出现aha moment,最终的在superclevr上test200_counting上acc是74%,无法对齐82.5%
训练命令如下,8卡A100
cd src/r1-v
export DEBUG_MODE="true"
export LOG_PATH="./debug_log_2b.txt"
torchrun --nproc_per_node="8"
--nnodes="1"
--node_rank="0"
--master_addr="127.0.0.1"
--master_port="12345"
src/open_r1/grpo.py
--output_dir /root/code/R1-V/clevr_cogen_a_train
--model_name_or_path /modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
--dataset_name clevr_cogen_a_train
--deepspeed local_scripts/zero3.json
--max_prompt_length 512
--max_completion_length 512
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--logging_steps 1
--bf16
--report_to tensorboard
--gradient_checkpointing false
--attn_implementation flash_attention_2
--max_pixels 401408
--num_train_epochs 2
--run_name Qwen2-VL-2B-GRPO-CLEVR-70k
--save_steps 100
--save_only_model true
--num_generations 8
The text was updated successfully, but these errors were encountered: