[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_R… #32783

Prithviraj-R · 2025-11-11T12:45:25Z

[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_ROTATE OCL kernel

Description of the issue(symptom, root-cause, how it was resolved)

The existing pa_kv_cache_rotate_ref.cl kernel handles RoPE (Rotary Position Embedding) for the key cache.
In order to do the rotation calculation, the kernel first need to de-quantize the key cache content on the fly.
The dequantize logic differs based on whether "by-token" or "by-channel" kv cache quantization was used in the prior kv_cache_update stage.
This PR adds support for "by-channel" dequantization before rotation and again "by-channel" re-quantize post rotation.

The code and line that caused this issue (if it is not changed directly)

intel_gpu/src/graph/impls/ocl_v2/pa_kv_cache_rotate_ref.cl

Reproduction step and snapshot (if applicable. Do not attach for customer model)

NA

Problematic graph

NA

Checklist

Is it a proper fix? (not a workaround)
Did you include test case for this fix, if necessary?
Did you review existing test that can be extended to cover this scenario? Which test did you review? None. New tests were added to cover the support this PR adds.

Tickets:

CVS-170994

src/plugins/intel_gpu/src/graph/impls/ocl_v2/pa_kv_cache_rotate_ref.cl

…OTATE OCL kernel ### Details: 1. The existing kernel handles RoPE (Rotary Position Embedding) for the key cache. 2. For this we need to read the key cache contents (quantized) and apply corresponding rotation coefficients provided as inputs (cos, sin). 3. In order to do the rotation calculation, the kernel first need to de-quantize the key cache content on the fly. 4. The dequantize logic differs based on whether "by-token" or "by-channel" kv cache quantization was used in the prior kv_cache_update stage. 5. This PR adds support for "by-channel" dequantization before rotation and again "by-channel" re-quantize post rotation. ### Tickets: - *CVS-170994*

Prithviraj-R requested review from a team as code owners November 11, 2025 12:45

github-actions bot added the category: GPU OpenVINO GPU plugin label Nov 11, 2025

Prithviraj-R force-pushed the kv_cache_rotate_for_by_channel_quantization branch from e08e74e to 7a98306 Compare November 19, 2025 05:12

Prithviraj-R requested review from geunhwan, p-durandin and yeonbok November 19, 2025 05:27

mklimenk reviewed Nov 20, 2025

View reviewed changes

src/plugins/intel_gpu/src/graph/impls/ocl_v2/pa_kv_cache_rotate_ref.cl Outdated Show resolved Hide resolved

Prithviraj-R added 3 commits November 20, 2025 01:03

Update paged_attention_opt.cpp

c909844

Update pa_kv_cache_rotate_ref.cl

4523fd7

Prithviraj-R force-pushed the kv_cache_rotate_for_by_channel_quantization branch from a3f8e79 to 4523fd7 Compare November 20, 2025 09:03

Prithviraj-R requested a review from mklimenk November 20, 2025 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_R… #32783

[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_R… #32783

Prithviraj-R commented Nov 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_R… #32783

Are you sure you want to change the base?

[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_R… #32783

Conversation

Prithviraj-R commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the issue(symptom, root-cause, how it was resolved)

The code and line that caused this issue (if it is not changed directly)

Reproduction step and snapshot (if applicable. Do not attach for customer model)

Problematic graph

Checklist

Tickets:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prithviraj-R commented Nov 11, 2025 •

edited

Loading