fix backward error in DDP when running reward model finetune in RLHF #507

sywangyi · 2023-11-03T04:07:13Z

I am enabling RLHF in habana, when enable reward model finetuning in 8 gaudi2 card using DDP. error happened in backward.

code like
https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/reward_modeling.py

command like
python ../instruction/gaudi_spawn.py --world_size 8 --use_mpi reward_modeling.py --model_name_or_path meta-llama/Llama-2-7b-hf --log_level info --num_train_epochs 3 --use_habana --output_dir output --ddp_find_unused_parameters True --logging_steps 10 --use_lazy_mode --evaluation_strategy="steps"

error like
Traceback (most recent call last):
File "/root/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/reward_modeling.py", line 475, in
trainer.train()
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 504, in train
return inner_training_loop(
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 837, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/intel-extension-for-transformers/optimum-habana/optimum/habana/transformers/trainer.py", line 1361, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1989, in backward
loss.backward(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 498, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpex/kernels/RotaryPosEmbeddingHelper.py", line 157, in backward
cos, sin, position_ids = ctx.saved_tensors
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [HPUBFloat16Type [1, 1, 512, 128]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
0%| | 0/354 [00:07<?, ?it/s]

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[3851,1],5]
Exit code: 1

sywangyi · 2023-11-03T04:08:27Z

@regisss

sywangyi · 2023-11-03T04:09:18Z

I don't know why. it only occurs in multi-card case. single card does not have such issue.

HuggingFaceDocBuilderDev · 2023-11-03T04:12:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sywangyi · 2023-11-03T04:15:51Z

@yao-matrix

regisss · 2023-11-03T08:23:22Z

@sywangyi Weird that it only occurs with DDP indeed. Does it rely on the trl library?

@mandy-li Have you ever seen this behaviour with FusedRoPE?

sywangyi · 2023-11-03T09:25:10Z

the script is changed from https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py. just make it work in Habana.

sywangyi · 2023-11-03T09:29:40Z

the reward modeling compute_loss is a little different from normal. see https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py#L268-L271. not sure if this is the cause of the issue

mandy-li · 2023-11-03T16:53:48Z

@regisss , no, i've never seen this behavior before.

regisss · 2023-11-06T13:35:07Z

@sywangyi Waiting for #475 to be merged (it should happen soon) so that I can run an up-to-date CI on this branch. I'll let you know when this is done.

sywangyi · 2023-11-14T01:24:53Z

@regisss, will you help merge the PR? I am enabling RLHF (PPO) in Gaudi2, basic function is working now for reward modeling and reinforcement learning, and performance is optimistic. later I would like to clean the code and upload the PPO and DPO related example to optimum-habana.

mandy-li · 2023-11-14T01:41:15Z

@sywangyi , please file a jira to Habana with a simple test case to reproduce the problem. We need to investigate the root cause before we merge any workaround.

sywangyi · 2023-11-14T02:05:08Z

@mandy-li have filed a jira in habana jira system

sywangyi · 2023-11-30T07:25:16Z

still see this issue in SW release 1.13. Per @mandy-li，we could merge it as WA and remove it once the problem is fixed in Synapse, I test by my side, not see performance regress in finetune and inference side.
see https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1253 for the details

regisss · 2023-11-30T09:18:26Z

I cannot access HS-1253 so I'll let @mandy-li and @libinta decide the way to go here

sywangyi · 2023-12-01T00:09:05Z

@mandy-li could you comment on this?

mandy-li · 2024-01-04T23:03:05Z

@mandy-li could you comment on this?

@regisss , @sywangyi , it is ok for me to use the workaround. The problem is targeted to be fixed in SynapseAI in 1.15 because of this workaround. @sywangyi , you didn't see any perf degradation with the workaround (e.g extra clone ops), right?

sywangyi · 2024-01-08T00:59:24Z

@mandy-li could you comment on this?

@regisss , @sywangyi , it is ok for me to use the workaround. The problem is targeted to be fixed in SynapseAI in 1.15 because of this workaround. @sywangyi , you didn't see any perf degradation with the workaround (e.g extra clone ops), right?

yes, I didn't see the perf degradation

regisss · 2024-01-08T08:40:16Z

Sounds good!
@sywangyi Could you add the following comment right above the line you modified please?

# TODO: remove `.clone()` when SynapseAI v1.15 is released

And then I'll merge it!

Signed-off-by: Wang, Yi A <[email protected]>

…uggingface#507)

sywangyi requested review from mandy-li and libinta as code owners November 3, 2023 04:07

sywangyi requested a review from a user November 3, 2023 04:07

sywangyi force-pushed the ddp_rm branch from 1333a2d to e9f7f30 Compare November 3, 2023 05:50

sywangyi changed the title ~~fix backward error in DDP when runining reward model finetune in RLHF~~ fix backward error in DDP when running reward model finetune in RLHF Nov 3, 2023

sywangyi force-pushed the ddp_rm branch from e9f7f30 to 0c4596c Compare November 3, 2023 05:58

sywangyi mentioned this pull request Dec 28, 2023

add PPO and stack_llama support #615

Merged

3 tasks

fix backward error in DDP when running reward model finetune in RLHF

361b252

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi force-pushed the ddp_rm branch from 0c4596c to 361b252 Compare January 8, 2024 08:44

regisss approved these changes Jan 8, 2024

View reviewed changes

regisss merged commit fe51573 into main Jan 8, 2024
9 checks passed

regisss deleted the ddp_rm branch January 8, 2024 09:25

piotrbocian mentioned this pull request Jan 9, 2024

Rebase on upstream/main #628

Closed

jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024

Fix backward error in DDP when running reward model finetune in RLHF (h…

f696e1c

…uggingface#507)

regisss added a commit that referenced this pull request Mar 5, 2024

Update comment added in #507

4a38cfe

puneeshkhanna pushed a commit to puneeshkhanna/optimum-habana-fork that referenced this pull request Mar 11, 2024

Update comment added in huggingface#507

f01544d

HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 11, 2024

Update comment added in huggingface#507

d2b8cd0

sywangyi mentioned this pull request Jun 18, 2024

add ci test for trl rewarding and ppo, fix backward failure in ppo ca… #1020

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix backward error in DDP when running reward model finetune in RLHF #507

fix backward error in DDP when running reward model finetune in RLHF #507

sywangyi commented Nov 3, 2023 •

edited

Loading

sywangyi commented Nov 3, 2023 •

edited

Loading

sywangyi commented Nov 3, 2023

HuggingFaceDocBuilderDev commented Nov 3, 2023

sywangyi commented Nov 3, 2023

regisss commented Nov 3, 2023

sywangyi commented Nov 3, 2023

sywangyi commented Nov 3, 2023 •

edited

Loading

mandy-li commented Nov 3, 2023

regisss commented Nov 6, 2023

sywangyi commented Nov 14, 2023

mandy-li commented Nov 14, 2023

sywangyi commented Nov 14, 2023

sywangyi commented Nov 30, 2023 •

edited

Loading

regisss commented Nov 30, 2023

sywangyi commented Dec 1, 2023

mandy-li commented Jan 4, 2024 •

edited

Loading

sywangyi commented Jan 8, 2024

regisss commented Jan 8, 2024

fix backward error in DDP when running reward model finetune in RLHF #507

fix backward error in DDP when running reward model finetune in RLHF #507

Conversation

sywangyi commented Nov 3, 2023 • edited Loading

sywangyi commented Nov 3, 2023 • edited Loading

sywangyi commented Nov 3, 2023

HuggingFaceDocBuilderDev commented Nov 3, 2023

sywangyi commented Nov 3, 2023

regisss commented Nov 3, 2023

sywangyi commented Nov 3, 2023

sywangyi commented Nov 3, 2023 • edited Loading

mandy-li commented Nov 3, 2023

regisss commented Nov 6, 2023

sywangyi commented Nov 14, 2023

mandy-li commented Nov 14, 2023

sywangyi commented Nov 14, 2023

sywangyi commented Nov 30, 2023 • edited Loading

regisss commented Nov 30, 2023

sywangyi commented Dec 1, 2023

mandy-li commented Jan 4, 2024 • edited Loading

sywangyi commented Jan 8, 2024

regisss commented Jan 8, 2024

sywangyi commented Nov 3, 2023 •

edited

Loading

sywangyi commented Nov 3, 2023 •

edited

Loading

sywangyi commented Nov 3, 2023 •

edited

Loading

sywangyi commented Nov 30, 2023 •

edited

Loading

mandy-li commented Jan 4, 2024 •

edited

Loading