不能复现结果 #112

zhiwenhou1227 · 2025-02-18T08:48:06Z

训练200个step的log如上，
completion_length先降低后增加，acc_reward和format_reward会不断增加，但是看readme中的format_reward最后基本为0，completion_length
最后也是降低到20左右，kl的变化跟readme中也对应不上，debug_log_2b.txt也没有出现aha moment，最终的在superclevr上test200_counting上acc是74%，无法对齐82.5%

训练命令如下，8卡A100

cd src/r1-v

export DEBUG_MODE="true"
export LOG_PATH="./debug_log_2b.txt"

torchrun --nproc_per_node="8"
--nnodes="1"
--node_rank="0"
--master_addr="127.0.0.1"
--master_port="12345"
src/open_r1/grpo.py
--output_dir /root/code/R1-V/clevr_cogen_a_train
--model_name_or_path /modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
--dataset_name clevr_cogen_a_train
--deepspeed local_scripts/zero3.json
--max_prompt_length 512
--max_completion_length 512
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--logging_steps 1
--bf16
--report_to tensorboard
--gradient_checkpointing false
--attn_implementation flash_attention_2
--max_pixels 401408
--num_train_epochs 2
--run_name Qwen2-VL-2B-GRPO-CLEVR-70k
--save_steps 100
--save_only_model true
--num_generations 8

TobiasLee · 2025-02-19T06:22:55Z

How about evaluate an earlier checkpoint?

zhiwenhou1227 · 2025-02-19T06:38:50Z

How about evaluate an earlier checkpoint?

acc 74 是第100个step保存的权重评测结果

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

不能复现结果 #112

不能复现结果 #112

zhiwenhou1227 commented Feb 18, 2025 •

edited

Loading

TobiasLee commented Feb 19, 2025

zhiwenhou1227 commented Feb 19, 2025

不能复现结果 #112

不能复现结果 #112

Comments

zhiwenhou1227 commented Feb 18, 2025 • edited Loading

TobiasLee commented Feb 19, 2025

zhiwenhou1227 commented Feb 19, 2025

zhiwenhou1227 commented Feb 18, 2025 •

edited

Loading