PaddlePaddle · Jiang-Jia-Jun · Sep 3, 2025 · Sep 2, 2025 · Sep 3, 2025
diff --git a/custom_ops/gpu_ops/speculate_decoding/speculate_get_padding_offset.cu b/custom_ops/gpu_ops/speculate_decoding/speculate_get_padding_offset.cu
@@ -139,6 +139,7 @@ std::vector<paddle::DataType> SpeculateGetPaddingOffsetInferDtype(
 PD_BUILD_STATIC_OP(speculate_get_padding_offset)
     .Inputs({"input_ids",
              "draft_tokens",
+             "cum_offsets"
              "token_num",
              "seq_len",
              "seq_lens_encoder"})

diff --git a/docs/zh/offline_inference.md b/docs/zh/offline_inference.md
@@ -35,7 +35,7 @@ for output in outputs:
 
 上述示例中 ``LLM``配置方式， `SamplingParams` ，`LLM.generate` ，`LLM.chat`以及输出output对应的结构体 `RequestOutput` 接口说明见如下文档说明。
 
-> 注： 若为思考模型, 加载模型时需要指定 `resoning_parser` 参数，并在请求时, 可以通过配置 `chat_template_kwargs` 中 `enable_thinking`参数, 进行开关思考。
+> 注： 若为思考模型, 加载模型时需要指定 `reasoning_parser` 参数，并在请求时, 可以通过配置 `chat_template_kwargs` 中 `enable_thinking`参数, 进行开关思考。
 
 ```python
 from fastdeploy.entrypoints.llm import LLM