Skip to content

Conversation

@Prithviraj-R
Copy link
Contributor

@Prithviraj-R Prithviraj-R commented Nov 11, 2025

[GPU] Enable BY_CHANNEL de-quantize/re-quantize support in KV_CACHE_ROTATE OCL kernel

Description of the issue(symptom, root-cause, how it was resolved)

  • The existing pa_kv_cache_rotate_ref.cl kernel handles RoPE (Rotary Position Embedding) for the key cache.
  • In order to do the rotation calculation, the kernel first need to de-quantize the key cache content on the fly.
  • The dequantize logic differs based on whether "by-token" or "by-channel" kv cache quantization was used in the prior kv_cache_update stage.
  • This PR adds support for "by-channel" dequantization before rotation and again "by-channel" re-quantize post rotation.

The code and line that caused this issue (if it is not changed directly)

  • intel_gpu/src/graph/impls/ocl_v2/pa_kv_cache_rotate_ref.cl

Reproduction step and snapshot (if applicable. Do not attach for customer model)

  • NA

Problematic graph

  • NA

Checklist

  • Is it a proper fix? (not a workaround)
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario? Which test did you review? None. New tests were added to cover the support this PR adds.

Tickets:

@Prithviraj-R Prithviraj-R requested review from a team as code owners November 11, 2025 12:45
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Nov 11, 2025
@Prithviraj-R Prithviraj-R force-pushed the kv_cache_rotate_for_by_channel_quantization branch from e08e74e to 7a98306 Compare November 19, 2025 05:12
…OTATE OCL kernel

### Details:
1. The existing kernel handles RoPE (Rotary Position Embedding) for the key cache.
2. For this we need to read the key cache contents (quantized) and apply corresponding rotation coefficients provided as inputs (cos, sin).
3. In order to do the rotation calculation, the kernel first need to de-quantize the key cache content on the fly.
4. The dequantize logic differs based on whether "by-token" or "by-channel" kv cache quantization was used in the prior kv_cache_update stage.
5. This PR adds support for "by-channel" dequantization before rotation and again "by-channel" re-quantize post rotation.

### Tickets:
- *CVS-170994*
@Prithviraj-R Prithviraj-R force-pushed the kv_cache_rotate_for_by_channel_quantization branch from a3f8e79 to 4523fd7 Compare November 20, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants