Skip to content

Commit c69a0a0

Browse files
authored
Request to Add CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse for RAG
Thanks for this great list! We’d love to add CacheCraft—a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025. We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production. Recent works like CacheFocus and EPIC further build on related ideas, highlighting the growing relevance of this research direction.
1 parent ea9ad64 commit c69a0a0

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ description: >-
8686
| [CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving](https://arxiv.org/pdf/2310.07240) | KV Cache compression | University of Chicage | sigcomm |
8787
| [SCOPE:OptimizingKey-Value Cache Compression in Long-context Generation](https://arxiv.org/pdf/2412.13649) | Separate handling of prefill and decoding KV Cache | SEU | arxiv 2024 |
8888
| [FASTDECODE: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines](https://arxiv.org/pdf/2403.11421) | Heterogeneous pipelines | THU | arxiv 2024 |
89+
| [Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation](https://www.arxiv.org/pdf/2502.15734) | Approximate Chunked KV Reuse | Adobe Research | SIGMOD 2025 |
8990

9091

9192

0 commit comments

Comments
 (0)