Qwen2.5-VL-3B OOM #107

Liuziyu77 · 2025-02-17T08:35:55Z

torchrun --nnodes $NNODES --nproc_per_node $GPUS_PER_NODE --node_rank $SLURM_NODEID --master_addr $(scontrol show hostname $SLURM_NODELIST | head -n1) --master_port ${MASTER_PORT} ./src/open_r1/grpo.py \
    --output_dir ${SAVE_PATH}  \
    --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
    --dataset_name ${DATA_PATH} \
    --deepspeed ./local_scripts/zero3.json \
    --max_prompt_length 1024 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --logging_steps 1 \
    --bf16 \
    --report_to wandb \
    --gradient_checkpointing false \
    --attn_implementation flash_attention_2 \
    --max_pixels 401408 \
    --num_train_epochs 1 \
    --run_name Qwen2-VL-2B-Debug \
    --save_steps 100 \
    --save_only_model true \
    --num_generations 8

When I try to train Qwen2.5-VL-3B on 8*A100, it out of memory. That's why?

The text was updated successfully, but these errors were encountered:

wnzhyee · 2025-02-17T13:39:52Z

same peoblem. due to the logs, it may caused by the flash_attention_2 method is not used correctly, but i dont know how to change it.

lzk9508 · 2025-02-20T10:02:35Z

same peoblem. due to the logs, it may caused by the flash_attention_2 method is not used correctly, but i dont know how to change it.

same problem

YEXINGZHE54 · 2025-02-21T12:58:48Z

same problem, why it consumes all memory for Qwen2-VL-2B-Instruct

tcy6 · 2025-02-23T11:41:05Z

同问，感觉代码是不是哪里有bug，按理来说在这么小的设置下不应该OOM的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5-VL-3B OOM #107

Qwen2.5-VL-3B OOM #107

Liuziyu77 commented Feb 17, 2025

wnzhyee commented Feb 17, 2025

lzk9508 commented Feb 20, 2025

YEXINGZHE54 commented Feb 21, 2025

tcy6 commented Feb 23, 2025

Qwen2.5-VL-3B OOM #107

Qwen2.5-VL-3B OOM #107

Comments

Liuziyu77 commented Feb 17, 2025

wnzhyee commented Feb 17, 2025

lzk9508 commented Feb 20, 2025

YEXINGZHE54 commented Feb 21, 2025

tcy6 commented Feb 23, 2025