Skip to content

Commit f8488b1

Browse files
committed
Update 2025-09-07 22:12:08
1 parent 1e857b8 commit f8488b1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+5409
-5345
lines changed

_sources/advanced_features/lora.ipynb

Lines changed: 158 additions & 166 deletions
Large diffs are not rendered by default.

_sources/advanced_features/separate_reasoning.ipynb

Lines changed: 84 additions & 84 deletions
Large diffs are not rendered by default.

_sources/advanced_features/speculative_decoding.ipynb

Lines changed: 387 additions & 379 deletions
Large diffs are not rendered by default.

_sources/advanced_features/structured_outputs.ipynb

Lines changed: 122 additions & 117 deletions
Large diffs are not rendered by default.

_sources/advanced_features/structured_outputs_for_reasoning_models.ipynb

Lines changed: 173 additions & 164 deletions
Large diffs are not rendered by default.

_sources/advanced_features/tool_parser.ipynb

Lines changed: 157 additions & 170 deletions
Large diffs are not rendered by default.

_sources/advanced_features/vlm_query.ipynb

Lines changed: 261 additions & 254 deletions
Large diffs are not rendered by default.

_sources/basic_usage/deepseek.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Overall, with these optimizations, we have achieved up to **7x** acceleration in
104104
<img src="https://lmsys.org/images/blog/sglang_v0_3/deepseek_mla.svg" alt="Multi-head Latent Attention for DeepSeek Series Models">
105105
</p>
106106

107-
**Usage**: MLA optimization is enabled by default. For MLA models on Blackwell architecture (e.g., B200), the default backend is FlashInfer. To use the optimized TRTLLM MLA backend for decode operations, explicitly specify `--attention-backend trtllm_mla`. Note that TRTLLM MLA only optimizes decode operations - prefill operations (including multimodal inputs) will fall back to FlashInfer MLA.
107+
**Usage**: MLA optimization is enabled by default. For MLA models on Blackwell architecture (e.g., B200), the default backend is FlashInfer. To use the optimized TRTLLM MLA backend for prefill and decode operations, explicitly specify `--attention-backend trtllm_mla`.
108108

109109
**Reference**: Check [Blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/#deepseek-multi-head-latent-attention-mla-throughput-optimizations) and [Slides](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/lmsys_1st_meetup_deepseek_mla.pdf) for more details.
110110

_sources/basic_usage/native_api.ipynb

Lines changed: 178 additions & 184 deletions
Large diffs are not rendered by default.

_sources/basic_usage/offline_engine_api.ipynb

Lines changed: 471 additions & 441 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)