Skip to content

Commit 7027b27

Browse files
server: update cache_prompt documentation [no ci] (ggml-org#7745)
1 parent a5cabd7 commit 7027b27

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/server/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ node index.js
279279

280280
`id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1`
281281

282-
`cache_prompt`: Re-use previously cached prompt from the last request if possible. This may prevent re-caching the prompt from scratch. Default: `false`
282+
`cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false`
283283

284284
`system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)
285285

0 commit comments

Comments
 (0)