You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Request to Add CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse for RAG
Thanks for this great list! We’d love to add CacheCraft—a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025.
We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production. Recent works like CacheFocus and EPIC further build on related ideas, highlighting the growing relevance of this research direction.
Copy file name to clipboardexpand all lines: README.md
+1
Original file line number
Diff line number
Diff line change
@@ -86,6 +86,7 @@ description: >-
86
86
|[CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving](https://arxiv.org/pdf/2310.07240)| KV Cache compression | University of Chicage | sigcomm |
87
87
|[SCOPE:OptimizingKey-Value Cache Compression in Long-context Generation](https://arxiv.org/pdf/2412.13649)| Separate handling of prefill and decoding KV Cache | SEU | arxiv 2024 |
0 commit comments