-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix backward error in DDP when running reward model finetune in RLHF #507
Conversation
I don't know why. it only occurs in multi-card case. single card does not have such issue. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
the script is changed from https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py. just make it work in Habana. |
the reward modeling compute_loss is a little different from normal. see https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py#L268-L271. not sure if this is the cause of the issue |
@regisss , no, i've never seen this behavior before. |
@regisss, will you help merge the PR? I am enabling RLHF (PPO) in Gaudi2, basic function is working now for reward modeling and reinforcement learning, and performance is optimistic. later I would like to clean the code and upload the PPO and DPO related example to optimum-habana. |
@sywangyi , please file a jira to Habana with a simple test case to reproduce the problem. We need to investigate the root cause before we merge any workaround. |
@mandy-li have filed a jira in habana jira system |
still see this issue in SW release 1.13. Per @mandy-li,we could merge it as WA and remove it once the problem is fixed in Synapse, I test by my side, not see performance regress in finetune and inference side. |
@mandy-li could you comment on this? |
yes, I didn't see the perf degradation |
Sounds good!
And then I'll merge it! |
Signed-off-by: Wang, Yi A <[email protected]>
I am enabling RLHF in habana, when enable reward model finetuning in 8 gaudi2 card using DDP. error happened in backward.
code like
https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/reward_modeling.py
command like
python ../instruction/gaudi_spawn.py --world_size 8 --use_mpi reward_modeling.py --model_name_or_path meta-llama/Llama-2-7b-hf --log_level info --num_train_epochs 3 --use_habana --output_dir output --ddp_find_unused_parameters True --logging_steps 10 --use_lazy_mode --evaluation_strategy="steps"
error like
Traceback (most recent call last):
File "/root/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/reward_modeling.py", line 475, in
trainer.train()
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 504, in train
return inner_training_loop(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 837, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 1361, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1989, in backward
loss.backward(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 498, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpex/kernels/RotaryPosEmbeddingHelper.py", line 157, in backward
cos, sin, position_ids = ctx.saved_tensors
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [HPUBFloat16Type [1, 1, 512, 128]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
0%| | 0/354 [00:07<?, ?it/s]
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[3851,1],5]
Exit code: 1