Aria 无法正常执行 #116

DeadLining · 2025-02-19T05:01:21Z

sh文件：
torchrun --nproc_per_node=1 \ src/open_r1/grpo.py \ --output_dir checkpoints/${WANDB_RUN_NAME} \ --model_name_or_path /gpu/nfs/raymodel/rhymes-ai/Aria \ --deepspeed local_scripts/zero3.json \ --eval_strategy steps \ --eval_steps 2000 \ --max_prompt_length 10240 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 3 \ --logging_steps 1 \ --max_completion_length 2000 \ --bf16 \ --report_to wandb \ --gradient_checkpointing true \ --attn_implementation eager \ --max_pixels 2359296 \ --save_total_limit 8 \ --save_only_model true \ --save_steps 200 \ --num_train_epochs 3 \ --num_generations 5 \ --run_name $WANDB_RUN_NAME

transfomres版本：4.49.0.dev0

运行报错：
[rank0]: ValueError: AriaForConditionalGeneration does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co//gpu/nfs/raymodel/rhymes-ai/Aria/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

The text was updated successfully, but these errors were encountered:

TobiasLee · 2025-02-19T06:22:04Z

You may need to turn of the flash_attention?

DeadLining · 2025-02-19T06:42:27Z

You may need to turn of the flash_attention?

我设置了 attn_implementation 为 eager，应该不会启用 flash_attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aria 无法正常执行 #116

Aria 无法正常执行 #116

DeadLining commented Feb 19, 2025

TobiasLee commented Feb 19, 2025

DeadLining commented Feb 19, 2025

Aria 无法正常执行 #116

Aria 无法正常执行 #116

Comments

DeadLining commented Feb 19, 2025

TobiasLee commented Feb 19, 2025

DeadLining commented Feb 19, 2025