Skip to content

Commit 84a98b3

Browse files
committed
upd
1 parent 5d139bb commit 84a98b3

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

_posts/2025-03-10-sampling.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,10 @@ Our evaluation demonstrates that FlashInfer's sampling kernel delivers substanti
169169
</p>
170170

171171
## Community Adoption and Other Applications
172-
The FlashInfer sampling kernel has been widely adopted by several prominent frameworks, including [sglang](https://github.com/sgl-project/sglang) and [vLLM](https://github.com/vllm-project/vllm/pull/7137). We are grateful for the community's valuable feedback and bug reports that have helped improve the implementation. Beyond sampling, the core ideas behind our approach have broader applications, particularly in speculative decoding verification. This includes techniques like [chain speculative sampling](https://arxiv.org/pdf/2302.01318) and [tree speculative verification](https://arxiv.org/pdf/2305.09781).
172+
173+
The FlashInfer sampling kernel has gained widespread adoption across major LLM frameworks, including [MLC-LLM](https://github.com/mlc-ai/mlc-llm), [sglang](https://github.com/sgl-project/sglang), and [vLLM](https://github.com/vllm-project/vllm/pull/7137). The community's active engagement through feedback and bug reports has been instrumental in refining and improving our implementation.
174+
175+
Beyond token sampling, our approach's core principles have proven valuable in other areas of LLM inference optimization. For instance, our techniques have been particularly impactful in speculative decoding verification, as demonstrated in methods like [chain speculative sampling](https://arxiv.org/pdf/2302.01318) and [tree speculative verification](https://arxiv.org/pdf/2305.09781). Building on these foundations, recent innovations like [Twilight](https://github.com/tsinghua-ideal/Twilight) have further advanced the field by successfully combining top-p sampling with sparse attention in a unified approach.
173176

174177
## Implementation Details
175178

0 commit comments

Comments
 (0)