-
Notifications
You must be signed in to change notification settings - Fork 0
Update guidellm #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
): | ||
task = Task.current_task() | ||
|
||
print("Inside start vllm server") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a debugging print?
executable_path = os.path.dirname(sys.executable) | ||
vllm_path = os.path.join(executable_path, "vllm") | ||
|
||
num_gpus = torch.cuda.device_count() | ||
available_gpus = list(range(torch.cuda.device_count())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason lines 27-31 is necessary? using tensor-parallel-size will do the same with or without this as far as I know
|
||
parsed_target = urlparse(target) | ||
print(f"vllm path is: {vllm_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another debugging print?
] | ||
|
||
print(server_command) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging print?
server_process = subprocess.Popen(server_command, stdout=server_log_file, stderr=server_log_file, shell=False, env=subprocess_env) | ||
|
||
delay = 5 | ||
server_initialized = False | ||
for _ in range(server_wait_time // delay): | ||
try: | ||
response = requests.get(target + "/models") | ||
print(f"response: {response}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging print?
|
||
|
||
def main(configurations=None): | ||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I passed configurations as an argument is to enable execute_locally. In my experience, when executing the task locally, the process doesn't see the configuration_object for some reason. So I pass the config as a dict directly to the process. This only works for local processes.
Also, I see that you either try to fetch the configuration object or assume that it will be replaced get_parameters_dict. In ClearML parameters and configs are different things and you cannot replace one by the other
@@ -14,44 +14,58 @@ def start_vllm_server( | |||
vllm_args, | |||
model_id, | |||
target, | |||
server_wait_time, | |||
server_wait_time, | |||
gpu_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add gpu count as an argument? In general, if you are running a vllm server in a remote server, why would you not use all gpus?
# Resolve model_id | ||
model_id = resolve_model_id(args["Args"]["model"], clearml_model, force_download) | ||
|
||
gpu_count = int(guidellm_args.get("gpu_count", 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add gpu count as an argument? Why would someone use less gpus than all available?
else: | ||
filepath = Path(os.path.join(".", "src", "automation", "standards", "benchmarking", f"{DEFAULT_GUIDELLM_SCENARIO}.json")) | ||
current_scenario = GenerativeTextScenario.from_file(filepath, dict(guidellm_args)) | ||
print(current_scenario.model_fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a debugging print?
from guidellm.benchmark.scenario import GenerativeTextScenario, get_builtin_scenarios | ||
|
||
user_scenario = guidellm_args.get("scenario", "") | ||
if user_scenario: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment here to clarify that all these scenarios are to be defined in guidellm, not here. This is a temporary solution
This PR updates guidellm automation in clearml to use the latest guidellm pythonic interface with benchmarking scenarios.
The standard research benchmarking scenarios have been updated to reflect this change.
It also adds support for development using custom branches in a single location