Loadgen concurrent load type #263

changminbark · 2025-10-30T04:22:50Z

PR Template

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
This PRs introduces a way of producing constant load for concurrency per stage. This is needed to understand how the system performs under constant load. This is achieved by capping the max concurrency of the workers for every stage to achieve the desired level of concurrency.

Which issue(s) this PR fixes:

Fixes #252

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The load generator now has the option to generate constant load for a specific level of concurrency in each stage (workers with specific max concurrency values to achieve the level of concurrency for each stage). Graphs of the metrics against the level of concurrency are also generated.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Testing

Testing was done using the config.yml file shown below and the necessary services (like vLLM serving HuggingFaceTB/SmolLM2-135M-Instruct and local prometheus).

Click to expand functional test output

config.yaml

stage_0_lifecycle_metrics.json
stage_1_lifecycle_metrics.json
summary_lifecycle_metrics.json
summary_prometheus_metrics.json

changminbark · 2025-10-30T04:23:53Z

/assign @achandrasekar

jjk-g

Thank you for adding this!

inference_perf/loadgen/load_generator.py

inference_perf/utils/custom_tokenizer.py

docs/config.md

inference_perf/main.py

changminbark · 2025-10-30T20:58:24Z

Latest Test:

Validation test for loadgen config:

Misconfigured Yaml

load:
  type: constant
  stages:
  - rate: 50.0
    duration: 1
    num_requests: 50
    concurrency_level: 6
  - rate: 25.0
    duration: 1
    num_requests: 25
    concurrency_level: 2
api: 
  type: completion
  streaming: true
server:
  type: vllm
  model_name: HuggingFaceTB/SmolLM2-135M-Instruct
  base_url: http://0.0.0.0:8000
  ignore_eos: true
tokenizer:
  pretrained_model_name_or_path: HuggingFaceTB/SmolLM2-135M-Instruct
data:
  type: shareGPT
metrics:
  type: prometheus
  prometheus:
    url: http://localhost:9090
    scrape_interval: 15
report:
  request_lifecycle:
    summary: true
    per_stage: true
    per_request: false
  prometheus:
    summary: true
    per_stage: false

python3 inference_perf/main.py -c config.yml
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2025-10-30 14:48:15,299 - inference_perf.config - INFO - Using configuration from: config.yml
Traceback (most recent call last):
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 332, in <module>
    main_cli()
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 118, in main_cli
    config = read_config(args.config_file)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/config.py", line 298, in read_config
    converted_stages.append(StandardLoadStage(**stage))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for StandardLoadStage
num_requests
  Input should be None [type=none_required, input_value=50, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/none_required
concurrency_level
  Input should be None [type=none_required, input_value=6, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/none_required

Functional test (running inference)

config.yaml

stage_0_lifecycle_metrics.json
stage_1_lifecycle_metrics.json
summary_lifecycle_metrics.json
summary_prometheus_metrics.json

k8s-ci-robot · 2025-10-30T20:58:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: changminbark
Once this PR has been reviewed and has the lgtm label, please ask for approval from achandrasekar. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jjk-g · 2025-11-05T17:24:58Z

/lgtm

Note #257 is also making some significant load_generator changes cc @huaxig

k8s-ci-robot · 2025-11-15T04:05:17Z

New changes are detected. LGTM label has been removed.

changminbark · 2025-11-15T04:29:46Z

Just did a rebase and tested again:

Test

Misconfigured yaml

load:
  type: constant
  stages:
  - num_requests: 50
    concurrency_level: 6
    rate: 50.0
    duration: 1
  - num_requests: 25
    concurrency_level: 2
    rate: 25.0
    duration: 1
api: 
  type: completion
  streaming: true
server:
  type: vllm
  model_name: HuggingFaceTB/SmolLM2-135M-Instruct
  base_url: http://0.0.0.0:8000
  ignore_eos: true
tokenizer:
  pretrained_model_name_or_path: HuggingFaceTB/SmolLM2-135M-Instruct
data:
  type: shareGPT
metrics:
  type: prometheus
  prometheus:
    url: http://localhost:9090
    scrape_interval: 15
report:
  request_lifecycle:
    summary: true
    per_stage: true
    per_request: false
  prometheus:
    summary: true
    per_stage: false

(venv) chang-min@chang-min-GE66-Raider-10SF:~/Desktop/OpenSource/k8s/inference-perf$ python3 inference_perf/main.py -c config.yml
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2025-11-14 23:28:58,812 - inference_perf.config - INFO - Using configuration from: config.yml
Traceback (most recent call last):
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 331, in <module>
    main_cli()
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 118, in main_cli
    config = read_config(args.config_file)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/config.py", line 310, in read_config
    converted_stages.append(StandardLoadStage(**stage))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for StandardLoadStage
  Value error, num_requests should not be set for CONSTANT/POISSON load types [type=value_error, input_value={'num_requests': 50, 'con...e': 50.0, 'duration': 1}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

Functional test

api:
  type: completion
  streaming: true
  headers: null
data:
  type: shareGPT
  path: null
  input_distribution: null
  output_distribution: null
  shared_prefix: null
  trace: null
load:
  type: concurrent
  interval: 1.0
  stages:
  - num_requests: 50
    concurrency_level: 6
    rate: 50.0
    duration: 1
  - num_requests: 25
    concurrency_level: 2
    rate: 25.0
    duration: 1
  sweep: null
  num_workers: 16
  worker_max_concurrency: 0
  worker_max_tcp_connections: 2500
  trace: null
  circuit_breakers: []
  request_timeout: null
metrics:
  type: prometheus
  prometheus:
    scrape_interval: 15
    url: http://localhost:9090/
    filters: []
    google_managed: false
report:
  request_lifecycle:
    summary: true
    per_stage: true
    per_request: false
  prometheus:
    summary: true
    per_stage: false
storage:
  local_storage:
    path: reports-20251114-231857
    report_file_prefix: null
  google_cloud_storage: null
  simple_storage_service: null
server:
  type: vllm
  model_name: HuggingFaceTB/SmolLM2-135M-Instruct
  base_url: http://0.0.0.0:8000
  ignore_eos: true
  api_key: null
tokenizer:
  pretrained_model_name_or_path: HuggingFaceTB/SmolLM2-135M-Instruct
  trust_remote_code: null
  token: null
circuit_breakers: null

stage_0_lifecycle_metrics.json
stage_1_lifecycle_metrics.json
summary_lifecycle_metrics.json
summary_prometheus_metrics.json

SachinVarghese · 2025-11-15T20:29:56Z

@jjk-g can you review the concurrency load gen pattern here when yu get a chance?

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2025

k8s-ci-robot requested review from Bslabe123 and jjk-g October 30, 2025 04:22

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2025

k8s-ci-robot assigned achandrasekar Oct 30, 2025

jjk-g reviewed Oct 30, 2025

View reviewed changes

changminbark requested a review from jjk-g November 3, 2025 18:36

k8s-ci-robot assigned jjk-g Nov 5, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2025

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 15, 2025

changminbark added 5 commits November 14, 2025 22:54

feat: concurrent load gen; fix: merge conflict

7b5c7df

feat: concurrency metrics graph generation

223a92d

fix: linting issues

5ee4958

chore: updated doc and reset config.yml

efee32d

fix: removed unnecessary code, improved readability, updated docs

4240859

changminbark force-pushed the loadgen-concurrent-load-type branch from d807819 to 4240859 Compare November 15, 2025 04:05

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 15, 2025

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 15, 2025

chore: fix linting issues

6ca2a9c

changminbark force-pushed the loadgen-concurrent-load-type branch from 78d2482 to 6ca2a9c Compare November 15, 2025 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loadgen concurrent load type #263

Loadgen concurrent load type #263

changminbark commented Oct 30, 2025

Uh oh!

changminbark commented Oct 30, 2025

Uh oh!

jjk-g left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

changminbark commented Oct 30, 2025

Uh oh!

k8s-ci-robot commented Oct 30, 2025

Uh oh!

jjk-g commented Nov 5, 2025

Uh oh!

k8s-ci-robot commented Nov 15, 2025

Uh oh!

changminbark commented Nov 15, 2025

Uh oh!

SachinVarghese commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Loadgen concurrent load type #263

Are you sure you want to change the base?

Loadgen concurrent load type #263

Conversation

changminbark commented Oct 30, 2025

PR Template

Testing

Uh oh!

changminbark commented Oct 30, 2025

Uh oh!

jjk-g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

changminbark commented Oct 30, 2025

Latest Test:

Validation test for loadgen config:

Functional test (running inference)

Uh oh!

k8s-ci-robot commented Oct 30, 2025

Uh oh!

jjk-g commented Nov 5, 2025

Uh oh!

k8s-ci-robot commented Nov 15, 2025

Uh oh!

changminbark commented Nov 15, 2025

Test

Misconfigured yaml

Functional test

Uh oh!

SachinVarghese commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants