Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aria 无法正常执行 #116

Open
DeadLining opened this issue Feb 19, 2025 · 2 comments
Open

Aria 无法正常执行 #116

DeadLining opened this issue Feb 19, 2025 · 2 comments

Comments

@DeadLining
Copy link

sh文件:
torchrun --nproc_per_node=1 \ src/open_r1/grpo.py \ --output_dir checkpoints/${WANDB_RUN_NAME} \ --model_name_or_path /gpu/nfs/raymodel/rhymes-ai/Aria \ --deepspeed local_scripts/zero3.json \ --eval_strategy steps \ --eval_steps 2000 \ --max_prompt_length 10240 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 3 \ --logging_steps 1 \ --max_completion_length 2000 \ --bf16 \ --report_to wandb \ --gradient_checkpointing true \ --attn_implementation eager \ --max_pixels 2359296 \ --save_total_limit 8 \ --save_only_model true \ --save_steps 200 \ --num_train_epochs 3 \ --num_generations 5 \ --run_name $WANDB_RUN_NAME

transfomres版本:4.49.0.dev0

运行报错:
[rank0]: ValueError: AriaForConditionalGeneration does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co//gpu/nfs/raymodel/rhymes-ai/Aria/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

@TobiasLee
Copy link
Collaborator

You may need to turn of the flash_attention?

@DeadLining
Copy link
Author

You may need to turn of the flash_attention?

我设置了 attn_implementation 为 eager,应该不会启用 flash_attention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants