We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torchrun --nnodes $NNODES --nproc_per_node $GPUS_PER_NODE --node_rank $SLURM_NODEID --master_addr $(scontrol show hostname $SLURM_NODELIST | head -n1) --master_port ${MASTER_PORT} ./src/open_r1/grpo.py \ --output_dir ${SAVE_PATH} \ --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \ --dataset_name ${DATA_PATH} \ --deepspeed ./local_scripts/zero3.json \ --max_prompt_length 1024 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 2 \ --logging_steps 1 \ --bf16 \ --report_to wandb \ --gradient_checkpointing false \ --attn_implementation flash_attention_2 \ --max_pixels 401408 \ --num_train_epochs 1 \ --run_name Qwen2-VL-2B-Debug \ --save_steps 100 \ --save_only_model true \ --num_generations 8
When I try to train Qwen2.5-VL-3B on 8*A100, it out of memory. That's why?
The text was updated successfully, but these errors were encountered:
same peoblem. due to the logs, it may caused by the flash_attention_2 method is not used correctly, but i dont know how to change it.
Sorry, something went wrong.
same problem
same problem, why it consumes all memory for Qwen2-VL-2B-Instruct
同问,感觉代码是不是哪里有bug,按理来说在这么小的设置下不应该OOM的
No branches or pull requests
When I try to train Qwen2.5-VL-3B on 8*A100, it out of memory. That's why?
The text was updated successfully, but these errors were encountered: