Skip to content

Update speculative_decoding.ipynb on description about EAGLE-2 and EAGLE-3 #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions backend/speculative_decoding.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@
"source": [
"# Speculative Decoding\n",
"\n",
"SGLang now provides an EAGLE2-based speculative decoding option. Our implementation aims to maximize speed and efficiency and is considered to be among the fastest in open-source LLM engines.\n",
"SGLang now provides an EAGLE-based (EAGLE-2/EAGLE-3) speculative decoding option. Our implementation aims to maximize speed and efficiency and is considered to be among the fastest in open-source LLM engines.\n",
"\n",
"**Note:** Currently, Speculative Decoding in SGLang is compatible with radix cache and chunked prefill.\n",
"\n",
"### Performance Highlights\n",
"\n",
"- Official EAGLE code ([SafeAILab/EAGLE](https://github.com/SafeAILab/EAGLE)): ~200 tokens/s\n",
"- Official EAGLE (EAGLE-2) code ([SafeAILab/EAGLE](https://github.com/SafeAILab/EAGLE)): ~200 tokens/s\n",
"- Standard SGLang Decoding: ~156 tokens/s\n",
"- EAGLE Decoding in SGLang: ~297 tokens/s\n",
"- EAGLE Decoding in SGLang (w/ `torch.compile`): ~316 tokens/s\n",
"- EAGLE (EAGLE-2) Decoding in SGLang: ~297 tokens/s\n",
"- EAGLE (EAGLE-2) Decoding in SGLang (w/ `torch.compile`): ~316 tokens/s\n",
"\n",
"All benchmarks below were run on a single H100."
]
Expand Down