fix(backend): Prevent redundant backend validation in worker processes #323
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When running a concurrent benchmark, each worker process currently re-validates the backend instance. This leads to multiple, unnecessary "Test connection" requests being sent to the target endpoint, adding startup latency and log verbosity.
Summary
This PR optimizes the startup of concurrent benchmarks by preventing redundant backend validation in worker processes.
The Problem:
When a concurrent benchmark is initiated (e.g., with --rate-type=concurrent), the main process creates and validates the backend, which includes making a "Test connection" request. However, when the worker processes are spawned, each worker receives a copy of the backend object and calls the validate() method again.
This results in N extra validation calls and "Test connection" network requests, where N is the number of worker processes. This behavior has several minor drawbacks:
Details
This PR introduces a _validated flag to the Backend base class, which is initialized to False.
The validate() method is modified to first check this flag. If True, it returns immediately. If False, it proceeds with the validation logic and sets the flag to True upon successful completion.
Test Plan
Concurrent greater than 1
guidellm benchmark
--target "http://10.64.24.34:8000"
--processor "Qwen/Qwen3-0.6B"
--rate-type=concurrent
--rate=5
--max-requests 5
--data='{"prompt_tokens":16, "output_tokens":16}'
old
:03 < -:--:-- ]25-09-12 15:49:08|INFO |guidellm.backend.backend:validate:127 - OpenAIHTTPBackend validating backend openai_http
25-09-12 15:49:08|INFO |guidellm.backend.backend:validate:127 - OpenAIHTTPBackend validating backend openai_http
25-09-12 15:49:08|INFO |guidellm.backend.backend:validate:127 - OpenAIHTTPBackend validating backend openai_http
25-09-12 15:49:08|INFO |guidellm.backend.backend:validate:127 - OpenAIHTTPBackend validating backend openai_http
25-09-12 15:49:08|INFO |guidellm.backend.backend:validate:127 - OpenAIHTTPBackend validating backend openai_http
new
25-09-12 15:50:49|INFO |guidellm.backend.backend:create:71 - Creating backend of type openai_http
25-09-12 15:50:49|INFO |guidellm.backend.backend:validate:130 - OpenAIHTTPBackend validating backend openai_http
Related Issues
#322
Use of AI
## WRITTEN BY AI ##
)