Skip to content

Conversation

@mrDzurb
Copy link
Member

@mrDzurb mrDzurb commented Apr 7, 2025

Description

This PR introduces support for relaxing container family validation in multi-model deployment by incorporating container family compatibility rules.

Enhancements:

  • Introduced CONTAINER_FAMILY_COMPATIBILITY map, which defines compatible container families and preferred family when multiple compatible types are detected.
    • Example: "odsc-vllm-serving" and "odsc-vllm-serving-v1" are now treated as compatible; "odsc-vllm-serving-v1" is preferred.
  • Updated model validation logic to:
    • Detect when all models belong to compatible families.
    • Automatically select the preferred container family for deployment.
    • Raise errors when incompatible or missing families are found.

Added:

  • Utility function: get_preferred_compatible_family(...)
    • Returns the best-suited family from a list based on the compatibility map.
  • Unit tests for get_preferred_compatible_family.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 7, 2025
# The structure is:
# - Key: The preferred container family to use when multiple compatible families are selected.
# - Value: A list of all compatible families (including the preferred one).
CONTAINER_FAMILY_COMPATIBILITY: Dict[str, List[str]] = {
Copy link
Contributor

@kumar-shivam-ranjan kumar-shivam-ranjan Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there specific reason why we have chosen odsc-vllm-v1 as key and not odsc-vllm?
If i understand correctly , if 2 or more models are chosen with some models compatible with odsc-vllm-v1 and others with odsc-vllm, the group will be deployed with odsc-vllm-v1. and if all selected models are compatible with odsc-vllm , we still go ahead and deploy with odsc-vllm-v1?
Correct me if am wrong. @mrDzurb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
I believe odsc-vllm-v1 is preferred in both the cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no perfect solution for this, in my opinion. Ideally, we would re-test all service models and update them to use the latest container, but that would be too time consuming. For now, this is just a best-effort attempt to choose the most recent container family when models from different families are mixed. Hopefully, VLLM will continue to improve, and the enhancement introduced in this PR will be more robust.

@github-actions
Copy link

github-actions bot commented Apr 7, 2025

📌 Cov diff with main:

Coverage-76%

📌 Overall coverage:

Coverage-58.73%

return GPUShapesIndex(**data)


def get_preferred_compatible_family(selected_families: set[str]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use -> Optional[str] instead of str.

@mrDzurb mrDzurb merged commit e65f09c into main Apr 7, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants