Prerequisites
Problem Description
We'd like to integrate FlashInfer into the project to improve decoding and prefill efficiency.
This integration aims to leverage FlashInfer's optimized kernel for faster inference.
Proposed Solution
Alternatives Considered
No response
Additional Context
No response
Importance
Important
Usage Statistics (Optional)
No response
Prerequisites
Problem Description
We'd like to integrate FlashInfer into the project to improve decoding and prefill efficiency.
For decode, we need to port the logic described here:
https://docs.flashinfer.ai/generated/flashinfer.decode.single_decode_with_kv_cache.html
For prefill, a full integration of the necessary components from FlashInfer is required.
This integration aims to leverage FlashInfer's optimized kernel for faster inference.
Proposed Solution
Alternatives Considered
No response
Additional Context
No response
Importance
Important
Usage Statistics (Optional)
No response