Skip to content

Enhance the container family validation for multi-model deployment #1148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 7, 2025

Conversation

mrDzurb
Copy link
Member

@mrDzurb mrDzurb commented Apr 7, 2025

Description

This PR introduces support for relaxing container family validation in multi-model deployment by incorporating container family compatibility rules.

Enhancements:

  • Introduced CONTAINER_FAMILY_COMPATIBILITY map, which defines compatible container families and preferred family when multiple compatible types are detected.
    • Example: "odsc-vllm-serving" and "odsc-vllm-serving-v1" are now treated as compatible; "odsc-vllm-serving-v1" is preferred.
  • Updated model validation logic to:
    • Detect when all models belong to compatible families.
    • Automatically select the preferred container family for deployment.
    • Raise errors when incompatible or missing families are found.

Added:

  • Utility function: get_preferred_compatible_family(...)
    • Returns the best-suited family from a list based on the compatibility map.
  • Unit tests for get_preferred_compatible_family.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 7, 2025
# The structure is:
# - Key: The preferred container family to use when multiple compatible families are selected.
# - Value: A list of all compatible families (including the preferred one).
CONTAINER_FAMILY_COMPATIBILITY: Dict[str, List[str]] = {
Copy link
Member

@kumar-shivam-ranjan kumar-shivam-ranjan Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there specific reason why we have chosen odsc-vllm-v1 as key and not odsc-vllm?
If i understand correctly , if 2 or more models are chosen with some models compatible with odsc-vllm-v1 and others with odsc-vllm, the group will be deployed with odsc-vllm-v1. and if all selected models are compatible with odsc-vllm , we still go ahead and deploy with odsc-vllm-v1?
Correct me if am wrong. @mrDzurb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
I believe odsc-vllm-v1 is preferred in both the cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no perfect solution for this, in my opinion. Ideally, we would re-test all service models and update them to use the latest container, but that would be too time consuming. For now, this is just a best-effort attempt to choose the most recent container family when models from different families are mixed. Hopefully, VLLM will continue to improve, and the enhancement introduced in this PR will be more robust.

Copy link

github-actions bot commented Apr 7, 2025

📌 Cov diff with main:

Coverage-76%

📌 Overall coverage:

Coverage-58.73%

@@ -1316,3 +1317,40 @@ def load_gpu_shapes_index(
)

return GPUShapesIndex(**data)


def get_preferred_compatible_family(selected_families: set[str]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use -> Optional[str] instead of str.

@mrDzurb mrDzurb merged commit e65f09c into main Apr 7, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants