Skip to content

Conversation

@justinchuby
Copy link
Collaborator

When torch.onnx exports a model from transformers with SDPA, it generates a Concat
node to concatenate past_key/value with the new key/value to produce the graph output
for kv cache. This pattern can be fused into the Attention node, which has present_key
and present_value outputs. It is necessary for ONNX Runtime because it requires the outputs
to be produced by the Attention node when past_key and past_value inputs are provided.

Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 73.68421% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.39%. Comparing base (811937c) to head (c408516).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ipt/rewriter/rules/fusion/_attention_present_kv.py 76.47% 4 Missing ⚠️
onnxscript/rewriter/onnx_fusions/_onnx_fusions.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2632   +/-   ##
=======================================
  Coverage   70.38%   70.39%           
=======================================
  Files         222      223    +1     
  Lines       26288    26309   +21     
  Branches     2629     2629           
=======================================
+ Hits        18503    18519   +16     
- Misses       6865     6870    +5     
  Partials      920      920           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gramalingam pushed a commit that referenced this pull request Oct 15, 2025
Output present key value from the Attention op because past key value is
provided. Previously the Attention op created would consume past
key/value but not produce present key/value, which is not correct for
ORT.

<img width="1377" height="1225" alt="image"
src="https://github.com/user-attachments/assets/118958b4-bc27-4912-b70b-000549887c0f"
/>

Replaces #2632

Signed-off-by: Justin Chu <[email protected]>
@justinchuby
Copy link
Collaborator Author

This is still useful when enable_gqa=True

@justinchuby justinchuby reopened this Oct 16, 2025
@justinchuby justinchuby modified the milestones: 0.5.4, 0.5.5 Oct 16, 2025
@justinchuby justinchuby marked this pull request as draft October 17, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant