Skip to content

Conversation

git-jxj
Copy link
Contributor

@git-jxj git-jxj commented Sep 12, 2025

Summary

This PR improves the user experience of synthetic data generation by providing a more intelligent default for the number of samples.

The Problem:

Currently, when using synthetic data (e.g., --data='{"prompt_tokens": ...}'), the number of unique samples to generate defaults to 1000. If a user
only intends to run a small number of tests (e.g., --max-requests=50), the tool still spends a very long time generating 1000 large samples, only
to use 50 of them. This leads to a long, unexpected delay at the "Creating request loader..." step.

The Solution:

This PR plumbs the max_requests value down to the SyntheticDatasetCreator. The logic is updated so that if samples is not explicitly defined in
the --data configuration, it will default to the value of max_requests.

This makes the behavior much more intuitive:

  • If a user runs --max-requests=50, the loader creates 50 samples.
  • If a user runs --max-requests=50 but specifies --data='...samples=5', the loader respects the user's choice and creates only 5 samples.
  • If max_requests is not set, the loader falls back to the original default of 1000.

This change significantly reduces the startup time for benchmarks that use synthetic data with a specified number of requests.

Details

  • [ ]

Test Plan

guidellm benchmark run --max-requests 50 --data='{"prompt_tokens":100, "output_tokens":100}'

Before this PR: Observe the long delay and the log message: Created loader with 1000 unique requests...
After this PR: Created loader with 50 unique requests...

Related Issues

#319

  • Resolves #

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant