Fix Phi long context issue #1504

helena-intel · 2025-10-30T19:01:51Z

This is #1297 updated to latest main branch.

Currently inference on Phi-3-mini and Phi-4-mini returns bad outputs (random characters) when context gets larger than about 2000 tokens. This PR, contributed by @eaidova , fixes that. This is not my code. The original PR is no longer being updated; I'm making this a new PR to make it easier to discuss and add updates.

I saw no negative impact on inference speed. I see slightly different outputs with shorter contexts on SPR (on inference with the model exported with the PR vs the model exported with main). Any suggestions to fix that would be much appreciated.

Draft PR for now, awaiting some feedback and testing, but I hope we can merge this soon.

HuggingFaceDocBuilderDev · 2025-10-30T19:06:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/intel/openvino/modeling_decoder.py

rkazants · 2025-10-31T08:04:04Z

optimum/exporters/openvino/model_patcher.py

    return attn_output, None, past_key_value


+# @torch.jit.script


I think it makes sense to add the test with long prompt. Is this issue reproduced on tiny-model?

nikita-savelyevv · 2025-10-31T09:40:08Z

I see slightly different outputs with shorter contexts on SPR (on inference with the model exported with the PR vs the model exported with main).

I believe minor differences are expected on SPR. But if possible, WWB similarity should be run to see if the difference is significant or not.

rkazants · 2025-10-31T11:54:38Z

optimum/exporters/openvino/model_patcher.py

    return attn_output, None, past_key_value


+# @torch.jit.script


please remove not needed comments and commented out code

rkazants · 2025-11-03T14:17:08Z

optimum/exporters/openvino/model_patcher.py

-        ):
-            self._model.config.max_position_embeddings = self._model.config.original_max_position_embeddings
-
+        # currently, long RoPE can not be traced for long context support, disable it to avoid potential accuracy issues


now I think we don't need this comment. We solve the problem by this PR.

rkazants · 2025-11-03T14:27:43Z

optimum/exporters/openvino/model_patcher.py

+        if hasattr(self, "max_position_embeddings")
+        else self.config.max_position_embeddings
+    )
+    inv_freq = select_ext_factor(seq_len, original_max_position_embeddings, self.inv_freq, self.long_inv_freq)


Let us add a comment:

slow down all frequencies by scale factor for long prompts that makes attention more stable, i.e. preserve model accuracy

rkazants · 2025-11-03T14:30:08Z

optimum/exporters/openvino/model_patcher.py

+        elif self._model.config.max_position_embeddings != getattr(
+            self._model.config, "original_max_position_embeddings", self._model.config.max_position_embeddings
+        ):
+            self._model.config.max_position_embeddings = self._model.config.original_max_position_embeddings


shall we save original value for max_position_embeddings and recover it in __exit__ method?

rkazants · 2025-11-03T14:41:39Z

optimum/intel/openvino/modeling_decoder.py

+        logits_to_keep=None,
+        **kwargs,
+    ):
+        # Overwritten -- this model may need to switch between short and long rope, invalidating the cache in the


Am I correct that we have a problem when we have short and long prompts in consecutive generate calls?
We can't re-initialize inv_freqs from long_inv_freqs to short_inv_freqs and vise-versa? How this problem is solved?

rkazants · 2025-11-03T14:44:43Z

optimum/exporters/openvino/model_patcher.py

+            self._model.model._orig_forward = self._model.model.forward
+            self._model.model.forward = types.MethodType(phi3_442_forward, self._model.model)
+
+        # init inv_freq for torchscript tracing for PhiMoE


strange comment about torchscript tracing. Please revise.

echarlaix

Thanks a lot @helena-intel !!

echarlaix · 2025-11-03T13:57:53Z

optimum/intel/openvino/modeling_decoder.py



+class OVPhi3ForCausalLM(OVModelForCausalLM):
+    def prepare_inputs_for_generation(


would you mind adding a link to the original code
https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/models/phi3/modeling_phi3.py#L493

echarlaix · 2025-11-03T15:41:40Z

optimum/exporters/openvino/model_patcher.py

-        super().__enter__()
+        # Call OVDecoderModelPatcher.__enter__() directly to skip Phi3ModelPatcher's longrope logic
+        # PhiMoE has a different rotary embedding structure, longrope is not yet supported


why do we need to add all this modifications to PhiMoEModelPatcher? (if longrope is not yet supported then self._model.model.rotary_emb will never be set to "longrope") If we want to make sure we can raise an error in case it's ever the case

Initially tests failed for phi_moe, see https://github.com/huggingface/optimum-intel/actions/runs/18952102871/job/54119192964 . We should have longrope support for the MoE model too but not in this PR. I would be happy with a simpler solution to not enable longrope for the MoE model (but still have it working as it is now).

echarlaix · 2025-11-03T15:52:07Z

optimum/exporters/openvino/model_patcher.py

+    return torch.where(seq_len <= max_pos_embeddings, short_factor, long_factor)
+
+
+def long_rope(self, x, position_ids, seq_len=None):


would you mind adding a link to original code (https://github.com/huggingface/transformers/blob/v4.57.1/src/transformers/models/phi3/modeling_phi3.py#L324 ?)

echarlaix · 2025-11-03T16:03:39Z

optimum/exporters/openvino/model_patcher.py

+        scaling_factor = 1.0
+    else:
+        scaling_factor = math.sqrt(1 + math.log(scale) / math.log(original_max_position_embeddings))
+    cos = emb.cos() * scaling_factor


here we can't use self.attention_scaling ? https://github.com/huggingface/transformers/blob/63fbd50fb4ff7b586ab1b59b67f7464e62f9df69/src/transformers/modeling_rope_utils.py#L519

echarlaix · 2025-11-03T16:11:11Z

optimum/exporters/openvino/model_patcher.py

+    # Force float32 since bfloat16 loses precision on long contexts
+    # See https://github.com/huggingface/transformers/pull/29285
+    device_type = x.device.type
+    device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"


not used and should we ensure fp32 dtype also ?

eaidova and others added 7 commits May 12, 2025 16:31

test longrope phi4

4440904

update prepare_inputs_for_generation

72cd3c8

change condition

8aa5978

Merge branch 'main' into ea/lonrope_exp

19feb0b

Merge branch 'main' into ea/lonrope_exp

822664a

Merge remote-tracking branch 'origin/main' into ea/lonrope_exp

4426e18

Merge remote-tracking branch 'origin/main' into ea/lonrope_exp

9f0394a

helena-intel commented Oct 30, 2025

View reviewed changes

optimum/intel/openvino/modeling_decoder.py Outdated Show resolved Hide resolved

Skip longrope for phi_moe for now

c8adca6

helena-intel added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label Oct 30, 2025

rkazants reviewed Oct 31, 2025

View reviewed changes

helena-intel marked this pull request as ready for review October 31, 2025 10:24

rkazants reviewed Oct 31, 2025

View reviewed changes

helena-intel added 2 commits October 31, 2025 21:19

Remove commented out code

75af74c

Merge remote-tracking branch 'upstream/main' into ea/lonrope_exp

8185565

nikita-savelyevv requested review from IlyasMoutawwakil and echarlaix November 1, 2025 12:10

rkazants reviewed Nov 3, 2025

View reviewed changes

echarlaix reviewed Nov 3, 2025

View reviewed changes



		class OVPhi3ForCausalLM(OVModelForCausalLM):
		def prepare_inputs_for_generation(

		return torch.where(seq_len <= max_pos_embeddings, short_factor, long_factor)


		def long_rope(self, x, position_ids, seq_len=None):

Fix Phi long context issue #1504

Are you sure you want to change the base?

Fix Phi long context issue #1504

Uh oh!

Conversation

helena-intel commented Oct 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rkazants Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

echarlaix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rkazants Nov 3, 2025 •

edited

Loading