Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

不能复现结果 #112

Open
zhiwenhou1227 opened this issue Feb 18, 2025 · 2 comments
Open

不能复现结果 #112

zhiwenhou1227 opened this issue Feb 18, 2025 · 2 comments

Comments

@zhiwenhou1227
Copy link

zhiwenhou1227 commented Feb 18, 2025

Image
训练200个step的log如上,
completion_length先降低后增加,acc_reward和format_reward会不断增加,但是看readme中的format_reward最后基本为0,completion_length
最后也是降低到20左右,kl的变化跟readme中也对应不上,debug_log_2b.txt也没有出现aha moment,最终的在superclevr上test200_counting上acc是74%,无法对齐82.5%

训练命令如下,8卡A100

cd src/r1-v

export DEBUG_MODE="true"
export LOG_PATH="./debug_log_2b.txt"

torchrun --nproc_per_node="8"
--nnodes="1"
--node_rank="0"
--master_addr="127.0.0.1"
--master_port="12345"
src/open_r1/grpo.py
--output_dir /root/code/R1-V/clevr_cogen_a_train
--model_name_or_path /modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
--dataset_name clevr_cogen_a_train
--deepspeed local_scripts/zero3.json
--max_prompt_length 512
--max_completion_length 512
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--logging_steps 1
--bf16
--report_to tensorboard
--gradient_checkpointing false
--attn_implementation flash_attention_2
--max_pixels 401408
--num_train_epochs 2
--run_name Qwen2-VL-2B-GRPO-CLEVR-70k
--save_steps 100
--save_only_model true
--num_generations 8

@TobiasLee
Copy link
Collaborator

How about evaluate an earlier checkpoint?

@zhiwenhou1227
Copy link
Author

How about evaluate an earlier checkpoint?

acc 74 是第100个step保存的权重评测结果

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants