-
Notifications
You must be signed in to change notification settings - Fork 47
Enhance the container family validation for multi-model deployment #1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# The structure is: | ||
# - Key: The preferred container family to use when multiple compatible families are selected. | ||
# - Value: A list of all compatible families (including the preferred one). | ||
CONTAINER_FAMILY_COMPATIBILITY: Dict[str, List[str]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there specific reason why we have chosen odsc-vllm-v1
as key and not odsc-vllm
?
If i understand correctly , if 2 or more models are chosen with some models compatible with odsc-vllm-v1
and others with odsc-vllm
, the group will be deployed with odsc-vllm-v1
. and if all selected models are compatible with odsc-vllm
, we still go ahead and deploy with odsc-vllm-v1
?
Correct me if am wrong. @mrDzurb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I believe odsc-vllm-v1 is preferred in both the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no perfect solution for this, in my opinion. Ideally, we would re-test all service models and update them to use the latest container, but that would be too time consuming. For now, this is just a best-effort attempt to choose the most recent container family when models from different families are mixed. Hopefully, VLLM will continue to improve, and the enhancement introduced in this PR will be more robust.
@@ -1316,3 +1317,40 @@ def load_gpu_shapes_index( | |||
) | |||
|
|||
return GPUShapesIndex(**data) | |||
|
|||
|
|||
def get_preferred_compatible_family(selected_families: set[str]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use -> Optional[str]
instead of str
.
Description
This PR introduces support for relaxing container family validation in multi-model deployment by incorporating container family compatibility rules.
Enhancements:
CONTAINER_FAMILY_COMPATIBILITY
map, which defines compatible container families and preferred family when multiple compatible types are detected."odsc-vllm-serving"
and"odsc-vllm-serving-v1"
are now treated as compatible;"odsc-vllm-serving-v1"
is preferred.Added:
get_preferred_compatible_family(...)
get_preferred_compatible_family
.