Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Training qwen2.5-VL-7B-Instruct) AssertionError: Input and cos/sin must have the same dtype, got torch.float16 and torch.bfloat16 #105

Open
six-finger opened this issue Feb 17, 2025 · 8 comments

Comments

@six-finger
Copy link

Bash file:
Image
Image

Log:
Image
Image

@lky-violet
Copy link

Hello, when I switched the model from Qwen2.5-VL-3B-Instruct to Qwen2-VL-2B-Instruct, the error was resolved. I suspect it might be due to differences in model precision?

@six-finger
Copy link
Author

Hello, when I switched the model from Qwen2.5-VL-3B-Instruct to Qwen2-VL-2B-Instruct, the error was resolved. I suspect it might be due to differences in model precision?

This issue appears to be due to changes in the transformers library version. A similar issue (huggingface/transformers#36188) references the transformers version (f7a3c62), but after installing that specific version, I encountered a new error:
Image

@weizhepei
Copy link

+1 Same issue when using this script:

CUDA_VISIBLE_DEVICES="1,2,3,4,5,6,7" torchrun --nproc_per_node="7" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="12345" \
    src/open_r1/grpo.py \
    --output_dir $OUTPUT_DIR \
    --model_name_or_path $QWEN_PATH \
    --dataset_name $HF_DATASET \
    --max_prompt_length 512 \
    --max_completion_length 1024 \
    --temperature 1.0 \
    --num_generations 4 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --logging_steps 1 \
    --bf16 \
    --report_to wandb \
    --gradient_checkpointing false \
    --attn_implementation flash_attention_2 \
    --max_pixels 401408 \
    --num_train_epochs 2 \
    --run_name $RUN_NAME \
    --save_steps 100 \
    --save_only_model true \
    --deepspeed local_scripts/zero3.json

Lib versions:

flash-attn                2.7.4.post1              pypi_0    pypi
r1-v                      0.1.0                     dev_0    <develop>
transformers              4.50.0.dev0              pypi_0    pypi
vllm                      0.7.2                    pypi_0    pypi

@TobiasLee Any pointers on this issue? 👀

@robinjoe93
Copy link

may be a "deepspeed" error. I run this command without "--deepspeed local_scripts/zero3.json", it can work

@lky-violet
Copy link

may be a "deepspeed" error. I run this command without "--deepspeed local_scripts/zero3.json", it can work

I test your method: delete "--deepspeed local_scripts/zero3.json". I only have 4 A100 GPUs, but when I run the code with export CUDA_VISIBLE_DEVICES="0,1,6,7", it outputs the error: ** CUDA out of memory. Tried to allocate 30.00 MiB. GPU 3 has a total capacity of 79.15 GiB**. What should I do?

@robinjoe93
Copy link

may be a "deepspeed" error. I run this command without "--deepspeed local_scripts/zero3.json", it can work

I test your method: delete "--deepspeed local_scripts/zero3.json". I only have 4 A100 GPUs, but when I run the code with export CUDA_VISIBLE_DEVICES="0,1,6,7", it outputs the error: ** CUDA out of memory. Tried to allocate 30.00 MiB. GPU 3 has a total capacity of 79.15 GiB**. What should I do?

decrease "max_prompt_length" \ "num_generations" \ "max_completion_length" \ "max_prompt_length"

@Syazvinski
Copy link

Temporary fix:
pip install git+https://github.com/huggingface/transformers.git@8ee50537fe7613b87881cd043a85971c85e99519

@llliuxiao
Copy link

Temporary fix: pip install git+https://github.com/huggingface/transformers.git@8ee50537fe7613b87881cd043a85971c85e99519

It works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants