Skip to content

Conversation

elizjo
Copy link
Member

@elizjo elizjo commented Aug 21, 2025

For AQUA Service models, pre-defined configuration files already exist and can be directly used. This PR adds support for service models via obtaining their pre-defined config files, parsing this metadata, and formatting a response consistent with the current output of the GPU Shape Recommendation Tool

ads aqua deployment recommend_shape --model_id  'ocid1.datasciencemodel.oc1.<ocid>' 

Returns
Screenshot 2025-08-21 at 11 04 56 AM

GET  /aqua/deployments/recommend_shapes/<md_ocid>

Returns

{
  "display_name": "meta-llama/Meta-Llama-3.1-8B",
  "recommendations": [
    {
      "shape_details": {
        "available": true,
        "core_count": 112,
        "memory_in_gbs": 1024,
        "name": "BM.GPU.L40S-NC.4",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 192,
          "gpu_count": 4,
          "gpu_type": "L40S",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "fp8",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 60,
            "performance": 80
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": null,
            "params": ""
          },
          "model_details": null,
          "recommendation": "Model fits well within the allowed compute shape."
        }
      ]
    },
    {
      "shape_details": {
        "available": true,
        "core_count": 64,
        "memory_in_gbs": 1024,
        "name": "BM.GPU.A10.4",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 96,
          "gpu_count": 4,
          "gpu_type": "A10",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 50,
            "performance": 50
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": null,
            "params": ""
          },
          "model_details": null,
          "recommendation": "Model fits well within the allowed compute shape."
        }
      ]
    },
    {
      "shape_details": {
        "available": true,
        "core_count": 30,
        "memory_in_gbs": 480,
        "name": "VM.GPU.A10.2",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 48,
          "gpu_count": 2,
          "gpu_type": "A10",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 40,
            "performance": 40
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": null,
            "params": ""
          },
          "model_details": null,
          "recommendation": "Model fits well within the allowed compute shape."
        }
      ]
    },
    {
      "shape_details": {
        "available": true,
        "core_count": 15,
        "memory_in_gbs": 240,
        "name": "VM.GPU.A10.1",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 24,
          "gpu_count": 1,
          "gpu_type": "A10",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 20,
            "performance": 30
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": 4096,
            "params": "--max-model-len 4096"
          },
          "model_details": null,
          "recommendation": "PARAMS: --max-model-len 4096\n\nModel fits well within the allowed compute shape."
        }
      ]
    }
  ],
  "troubleshoot": null
}

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 21, 2025
@elizjo elizjo changed the title [GPU Shape Recommendation] Support for Service Managed Models [WIP][GPU Shape Recommendation] Support for Service Managed Models Aug 21, 2025
@elizjo elizjo changed the base branch from main to ODSC-74228/GPU-Shape-Recommendation August 21, 2025 18:08
@elizjo elizjo changed the base branch from ODSC-74228/GPU-Shape-Recommendation to main August 21, 2025 18:09
Copy link

📌 Cov diff with main:

Coverage-1%

📌 Overall coverage:

Coverage-18.28%

Copy link

📌 Cov diff with main:

Coverage-1%

📌 Overall coverage:

Coverage-18.27%

@mrDzurb mrDzurb changed the title [WIP][GPU Shape Recommendation] Support for Service Managed Models [WIP][AQUA][GPU Shape Recommendation] Support for Service Managed Models Aug 26, 2025
Copy link

📌 Cov diff with main:

Coverage-1%

📌 Overall coverage:

Coverage-18.27%

@elizjo elizjo changed the base branch from main to ODSC-74228/GPU-Shape-Recommendation August 27, 2025 18:22
)
if request.deployment_config:
shape_recommendation_report = (
ShapeRecommendationReport.from_deployment_config(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check if a deployment_config was successfully obtained, and if so we will generate the report immediately.

@mrDzurb mrDzurb changed the base branch from ODSC-74228/GPU-Shape-Recommendation to main August 28, 2025 00:12
@mrDzurb mrDzurb changed the base branch from main to ODSC-74228/GPU-Shape-Recommendation August 28, 2025 00:13
@elizjo elizjo changed the base branch from ODSC-74228/GPU-Shape-Recommendation to main August 28, 2025 01:36
Copy link

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-18.26%

}

DEFAULT_WEIGHT_SIZE = "bfloat16"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we already have such variable, three lines below?

@@ -131,6 +131,10 @@ def construct_deployment_params(self) -> str:
# vLLM only supports 4bit in-flight quantization
params.append(VLLM_PARAMS["in_flight_quant"])

# add trust-remote-code if custom modules are specified
if c.trust_remote_code:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use more meaningful name for the config variable?
c->llm_config?

@@ -207,6 +210,17 @@ def validate_model_support(cls, raw: dict) -> ValueError:
"Encoder-decoder models (ex. T5, Gemma) and encoder-only (BERT) are not supported at this time."
)

@staticmethod
def get_bool(raw, key, default=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ads/coomon/utils we already have - parse_bool function, maybe this function can be reused?

tie_word_embeddings = LLMConfig.get_bool(raw, "tie_word_embeddings", True)

trust_remote_code = "auto_map" in raw # trust-remote-code is always needed when this key is present

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more description for the section below? Can the error be more specific?

        if None in [
            num_hidden_layers,
            hidden_size,
            vocab_size,
            num_attention_heads,
            head_dim,
        ]:
            raise ValueError("Missing required value in model config.")

@@ -93,6 +93,25 @@ def which_shapes(
shapes = self.valid_compute_shapes(compartment_id=request.compartment_id)

ds_model = self._validate_model_ocid(request.model_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this method also retrieves model details, but the function name doesn’t suggest that it returns the model config as well. I’d recommend separating the retrieval of model details from this function to make its purpose clearer and more consistent.

@@ -30,6 +41,10 @@ class RequestRecommend(BaseModel):
COMPARTMENT_OCID, description="The OCID of user's compartment"
)

deployment_config: Optional[AquaDeploymentConfig] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to do: default=None

)

else:
data = self._get_model_config(ds_model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why would we need to repeat same lines again?

{
"deployment_params": {
"quantization": "mxfp4",
"max_model_len": null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be confusing, that in one case we say that max_model_len is null, but in params it is 130000

if model:
model_size = str(model.total_model_gb)
else:
model_size = "Using Pre-Defined Config"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use - instead? Should we add total model gb param in the service configs?

@@ -42,7 +57,9 @@ class DeploymentParams(BaseModel): # noqa: N801
quantization: Optional[str] = Field(
None, description="Type of quantization (e.g. 4bit)."
)
max_model_len: Optional[int] = Field(None, description="Maximum length of input sequence.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflict?

@@ -231,3 +255,89 @@ class ShapeRecommendationReport(BaseModel):
None,
description="Details for troubleshooting if no shapes fit the current model.",
)

@classmethod
def from_deployment_config(cls, deployment_config: AquaDeploymentConfig, model_name: str, valid_shapes: List[ComputeShapeSummary]) -> "ShapeRecommendationReport":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the formatter, to format the code.

@@ -1297,6 +1297,9 @@ def recommend_shape(self, **kwargs) -> Union[Table, ShapeRecommendationReport]:
AquaValueError
If model type is unsupported by tool (no recommendation report generated)
"""
deployment_config = self.get_deployment_config(model_id=kwargs.get("model_id"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the description and add acceptable params for kwargs?

@mrDzurb
Copy link
Member

mrDzurb commented Aug 28, 2025

Could you also add telemetry for the shape recommender, we need to know if this feature is used by customers.

@@ -93,6 +93,25 @@ def which_shapes(
shapes = self.valid_compute_shapes(compartment_id=request.compartment_id)

ds_model = self._validate_model_ocid(request.model_id)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the description for this method? Specifically input params.

@@ -344,6 +365,8 @@ def test_which_shapes_valid(
],
)
def test_which_shapes_valid_from_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will potentially fail every time we update the GPU index json file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants