feat(data): Default synthetic samples to max_requests #320

git-jxj · 2025-09-12T06:22:06Z

Summary

This PR improves the user experience of synthetic data generation by providing a more intelligent default for the number of samples.

The Problem:

Currently, when using synthetic data (e.g., --data='{"prompt_tokens": ...}'), the number of unique samples to generate defaults to 1000. If a user
only intends to run a small number of tests (e.g., --max-requests=50), the tool still spends a very long time generating 1000 large samples, only
to use 50 of them. This leads to a long, unexpected delay at the "Creating request loader..." step.

The Solution:

This PR plumbs the max_requests value down to the SyntheticDatasetCreator. The logic is updated so that if samples is not explicitly defined in
the --data configuration, it will default to the value of max_requests.

This makes the behavior much more intuitive:

If a user runs --max-requests=50, the loader creates 50 samples.
If a user runs --max-requests=50 but specifies --data='...samples=5', the loader respects the user's choice and creates only 5 samples.
If max_requests is not set, the loader falls back to the original default of 1000.

This change significantly reduces the startup time for benchmarks that use synthetic data with a specified number of requests.

Details

[ ]

Test Plan

guidellm benchmark run --max-requests 50 --data='{"prompt_tokens":100, "output_tokens":100}'

Before this PR: Observe the long delay and the log message: Created loader with 1000 unique requests...
After this PR: Created loader with 50 unique requests...

Related Issues

#319

Resolves #

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: xinjun.jiang <[email protected]>

feat(data): Default synthetic samples to max_requests

4afa1cf

Signed-off-by: xinjun.jiang <[email protected]>

git-jxj mentioned this pull request Sep 12, 2025

Guidellm takes several minutes to create random requests with long prompts #270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(data): Default synthetic samples to max_requests #320

feat(data): Default synthetic samples to max_requests #320

Uh oh!

git-jxj commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat(data): Default synthetic samples to max_requests #320

Are you sure you want to change the base?

feat(data): Default synthetic samples to max_requests #320

Uh oh!

Conversation

git-jxj commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

Uh oh!

git-jxj commented Sep 12, 2025 •

edited

Loading