Skip to content

Update guidellm #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 91 commits into
base: main
Choose a base branch
from
Open

Update guidellm #6

wants to merge 91 commits into from

Conversation

Chibukach
Copy link
Collaborator

@Chibukach Chibukach commented Jul 3, 2025

This PR updates guidellm automation in clearml to use the latest guidellm pythonic interface with benchmarking scenarios.
The standard research benchmarking scenarios have been updated to reflect this change.
It also adds support for development using custom branches in a single location

@Chibukach Chibukach requested a review from anmarques July 3, 2025 14:52
):
task = Task.current_task()

print("Inside start vllm server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a debugging print?

executable_path = os.path.dirname(sys.executable)
vllm_path = os.path.join(executable_path, "vllm")

num_gpus = torch.cuda.device_count()
available_gpus = list(range(torch.cuda.device_count()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason lines 27-31 is necessary? using tensor-parallel-size will do the same with or without this as far as I know


parsed_target = urlparse(target)
print(f"vllm path is: {vllm_path}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another debugging print?

]

print(server_command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging print?

server_process = subprocess.Popen(server_command, stdout=server_log_file, stderr=server_log_file, shell=False, env=subprocess_env)

delay = 5
server_initialized = False
for _ in range(server_wait_time // delay):
try:
response = requests.get(target + "/models")
print(f"response: {response}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging print?



def main(configurations=None):
def main():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I passed configurations as an argument is to enable execute_locally. In my experience, when executing the task locally, the process doesn't see the configuration_object for some reason. So I pass the config as a dict directly to the process. This only works for local processes.

Also, I see that you either try to fetch the configuration object or assume that it will be replaced get_parameters_dict. In ClearML parameters and configs are different things and you cannot replace one by the other

@@ -14,44 +14,58 @@ def start_vllm_server(
vllm_args,
model_id,
target,
server_wait_time,
server_wait_time,
gpu_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add gpu count as an argument? In general, if you are running a vllm server in a remote server, why would you not use all gpus?

# Resolve model_id
model_id = resolve_model_id(args["Args"]["model"], clearml_model, force_download)

gpu_count = int(guidellm_args.get("gpu_count", 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add gpu count as an argument? Why would someone use less gpus than all available?

else:
filepath = Path(os.path.join(".", "src", "automation", "standards", "benchmarking", f"{DEFAULT_GUIDELLM_SCENARIO}.json"))
current_scenario = GenerativeTextScenario.from_file(filepath, dict(guidellm_args))
print(current_scenario.model_fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a debugging print?

from guidellm.benchmark.scenario import GenerativeTextScenario, get_builtin_scenarios

user_scenario = guidellm_args.get("scenario", "")
if user_scenario:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment here to clarify that all these scenarios are to be defined in guidellm, not here. This is a temporary solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants