Fix DeepSeek-V2 expert-parallelism failure due to indexing error #1765
+14
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes indexing error during multi-card inference with expert parallelism on DeepSeek-V2. The indexing issue causes the following failure:
python ../gaudi_spawn.py--world_size=2 run_generation.py --model_name_or_path deepseek-ai/DeepSeek-V2-Lite --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --parallel_strategy "ep" --prompt "DeepSpeed is a machine learning framework"
Stack trace:
Warming up iteration 1/3 /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:601: UserWarning: do_sample is set to False. However, temperature is set to 0.3 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:606: UserWarning: do_sample is set to False. However, top_p is set to 0.95 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. warnings.warn( Setting pad_token_id to eos_token_id:None for open-end generation. [rank0]: Traceback (most recent call last): [rank0]: File "/root/optimum-habana/examples/text-generation/run_generation.py", line 801, in [rank0]: main() [rank0]: File "/root/optimum-habana/examples/text-generation/run_generation.py", line 563, in main [rank0]: generate(None, args.reduce_recompile) [rank0]: File "/root/optimum-habana/examples/text-generation/run_generation.py", line 534, in generate [rank0]: outputs = model.generate( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/root/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1477, in generate [rank0]: result = self._sample( [rank0]: File "/root/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2458, in _sample [rank0]: outputs = self( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 745, in forward [rank0]: return wrapped_hpugraph_forward( [rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 610, in wrapped_hpugraph_forward [rank0]: outputs = orig_fwd(*args, **kwargs)
[rank0]: File "/root/optimum-habana/optimum/habana/transformers/models/deepseek_v2/modeling_deepseek_v2.py", line 1918, in forward
[rank0]: outputs = self.model(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1847, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1793, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/root/optimum-habana/optimum/habana/transformers/models/deepseek_v2/modeling_deepseek_v2.py", line 1714, in forward
[rank0]: layer_outputs = decoder_layer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1847, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1793, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/root/optimum-habana/optimum/habana/transformers/models/deepseek_v2/modeling_deepseek_v2.py", line 1411, in forward
[rank0]: hidden_states = self.mlp(hidden_states)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1847, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1793, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/root/optimum-habana/optimum/habana/transformers/models/deepseek_v2/modeling_deepseek_v2.py", line 700, in forward
[rank0]: htcore.mark_step()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/utils/internal.py", line 36, in lazy_wrapper
[rank0]: func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/step_closure.py", line 71, in mark_step
[rank0]: htcore._mark_step(device_str, sync)
[rank0]: RuntimeError: synNodeCreateWithId failed for node: moe_bf16 with synStatus 26 [Generic failure]. .
Fixes # (issue)
Before submitting