Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ std::vector<paddle::DataType> SpeculateGetPaddingOffsetInferDtype(
PD_BUILD_STATIC_OP(speculate_get_padding_offset)
.Inputs({"input_ids",
"draft_tokens",
"cum_offsets"
"token_num",
"seq_len",
"seq_lens_encoder"})
Expand Down
2 changes: 1 addition & 1 deletion docs/zh/offline_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ for output in outputs:

上述示例中 ``LLM``配置方式, `SamplingParams` ,`LLM.generate` ,`LLM.chat`以及输出output对应的结构体 `RequestOutput` 接口说明见如下文档说明。

> 注: 若为思考模型, 加载模型时需要指定 `resoning_parser` 参数,并在请求时, 可以通过配置 `chat_template_kwargs` 中 `enable_thinking`参数, 进行开关思考。
> 注: 若为思考模型, 加载模型时需要指定 `reasoning_parser` 参数,并在请求时, 可以通过配置 `chat_template_kwargs` 中 `enable_thinking`参数, 进行开关思考。

```python
from fastdeploy.entrypoints.llm import LLM
Expand Down
Loading