Skip to content

Conversation

ggerganov
Copy link
Member

fix #15894

The default prompt similarity threshold for reusing server slots is arguably too high (0.5 on master). Reduce it to 0.1. Also add a couple of logs to make the decisions about selecting a slot more clear.

@ggerganov ggerganov requested a review from ngxson as a code owner September 10, 2025 08:17
@ggerganov ggerganov merged commit f088b6a into master Sep 12, 2025
54 of 55 checks passed
@ggerganov ggerganov deleted the gg/server-adjust-sps branch September 12, 2025 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: gpt-oss model reprocess the entire prompt from beginning.
1 participant