feat(data): Default synthetic samples to max_requests #320
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves the user experience of synthetic data generation by providing a more intelligent default for the number of samples.
The Problem:
Currently, when using synthetic data (e.g., --data='{"prompt_tokens": ...}'), the number of unique samples to generate defaults to 1000. If a user
only intends to run a small number of tests (e.g., --max-requests=50), the tool still spends a very long time generating 1000 large samples, only
to use 50 of them. This leads to a long, unexpected delay at the "Creating request loader..." step.
The Solution:
This PR plumbs the max_requests value down to the SyntheticDatasetCreator. The logic is updated so that if samples is not explicitly defined in
the --data configuration, it will default to the value of max_requests.
This makes the behavior much more intuitive:
This change significantly reduces the startup time for benchmarks that use synthetic data with a specified number of requests.
Details
Test Plan
guidellm benchmark run --max-requests 50 --data='{"prompt_tokens":100, "output_tokens":100}'
Before this PR: Observe the long delay and the log message: Created loader with 1000 unique requests...
After this PR: Created loader with 50 unique requests...
Related Issues
#319
Use of AI
## WRITTEN BY AI ##
)