[WIP][AQUA][GPU Shape Recommendation] Support for Service Managed Models #1252

elizjo · 2025-08-21T18:08:09Z

For AQUA Service models, pre-defined configuration files already exist and can be directly used. This PR adds support for service models via obtaining their pre-defined config files, parsing this metadata, and formatting a response consistent with the current output of the GPU Shape Recommendation Tool

ads aqua deployment recommend_shape --model_id  'ocid1.datasciencemodel.oc1.<ocid>'

Returns

GET  /aqua/deployments/recommend_shapes/<md_ocid>

Returns

{
  "display_name": "meta-llama/Meta-Llama-3.1-8B",
  "recommendations": [
    {
      "shape_details": {
        "available": true,
        "core_count": 112,
        "memory_in_gbs": 1024,
        "name": "BM.GPU.L40S-NC.4",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 192,
          "gpu_count": 4,
          "gpu_type": "L40S",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "fp8",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 60,
            "performance": 80
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": null,
            "params": ""
          },
          "model_details": null,
          "recommendation": "Model fits well within the allowed compute shape."
        }
      ]
    },
    {
      "shape_details": {
        "available": true,
        "core_count": 64,
        "memory_in_gbs": 1024,
        "name": "BM.GPU.A10.4",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 96,
          "gpu_count": 4,
          "gpu_type": "A10",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 50,
            "performance": 50
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": null,
            "params": ""
          },
          "model_details": null,
          "recommendation": "Model fits well within the allowed compute shape."
        }
      ]
    },
    {
      "shape_details": {
        "available": true,
        "core_count": 30,
        "memory_in_gbs": 480,
        "name": "VM.GPU.A10.2",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 48,
          "gpu_count": 2,
          "gpu_type": "A10",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 40,
            "performance": 40
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": null,
            "params": ""
          },
          "model_details": null,
          "recommendation": "Model fits well within the allowed compute shape."
        }
      ]
    },
    {
      "shape_details": {
        "available": true,
        "core_count": 15,
        "memory_in_gbs": 240,
        "name": "VM.GPU.A10.1",
        "shape_series": "GPU",
        "gpu_specs": {
          "gpu_memory_in_gbs": 24,
          "gpu_count": 1,
          "gpu_type": "A10",
          "quantization": [
            "awq",
            "gptq",
            "marlin",
            "int8",
            "bitblas",
            "aqlm",
            "bitsandbytes",
            "deepspeedfp",
            "gguf"
          ],
          "ranking": {
            "cost": 20,
            "performance": 30
          }
        }
      },
      "configurations": [
        {
          "deployment_params": {
            "quantization": "bfloat16",
            "max_model_len": 4096,
            "params": "--max-model-len 4096"
          },
          "model_details": null,
          "recommendation": "PARAMS: --max-model-len 4096\n\nModel fits well within the allowed compute shape."
        }
      ]
    }
  ],
  "troubleshoot": null
}

github-actions · 2025-08-21T18:38:13Z

📌 Cov diff with main:

📌 Overall coverage:

github-actions · 2025-08-25T20:04:24Z

📌 Cov diff with main:

📌 Overall coverage:

github-actions · 2025-08-27T00:33:52Z

📌 Cov diff with main:

📌 Overall coverage:

elizjo · 2025-08-27T18:24:23Z

ads/aqua/shaperecommend/recommend.py

-            )
+            if request.deployment_config:
+                shape_recommendation_report = (
+                    ShapeRecommendationReport.from_deployment_config(


we check if a deployment_config was successfully obtained, and if so we will generate the report immediately.

github-actions · 2025-08-28T19:50:10Z

📌 Cov diff with main:

📌 Overall coverage:

mrDzurb · 2025-08-28T20:19:37Z

ads/aqua/shaperecommend/constants.py

 }

+DEFAULT_WEIGHT_SIZE = "bfloat16"


It looks like we already have such variable, three lines below?

mrDzurb · 2025-08-28T20:20:42Z

ads/aqua/shaperecommend/estimator.py

@@ -131,6 +131,10 @@ def construct_deployment_params(self) -> str:
            # vLLM only supports 4bit in-flight quantization
            params.append(VLLM_PARAMS["in_flight_quant"])

+        # add trust-remote-code if custom modules are specified
+        if c.trust_remote_code:


Could we use more meaningful name for the config variable?
c->llm_config?

mrDzurb · 2025-08-28T20:25:11Z

ads/aqua/shaperecommend/llm_config.py

@@ -207,6 +210,17 @@ def validate_model_support(cls, raw: dict) -> ValueError:
                "Encoder-decoder models (ex. T5, Gemma) and encoder-only (BERT) are not supported at this time."
            )

+    @staticmethod
+    def get_bool(raw, key, default=False):


In ads/coomon/utils we already have - parse_bool function, maybe this function can be reused?

mrDzurb · 2025-08-28T20:26:30Z

ads/aqua/shaperecommend/llm_config.py

+        tie_word_embeddings = LLMConfig.get_bool(raw, "tie_word_embeddings", True)
+
+        trust_remote_code = "auto_map" in raw # trust-remote-code is always needed when this key is present
+


Could you add more description for the section below? Can the error be more specific?

if None in [ num_hidden_layers, hidden_size, vocab_size, num_attention_heads, head_dim, ]: raise ValueError("Missing required value in model config.")

mrDzurb · 2025-08-28T20:34:42Z

ads/aqua/shaperecommend/recommend.py

@@ -93,6 +93,25 @@ def which_shapes(
            shapes = self.valid_compute_shapes(compartment_id=request.compartment_id)

            ds_model = self._validate_model_ocid(request.model_id)


I noticed that this method also retrieves model details, but the function name doesn’t suggest that it returns the model config as well. I’d recommend separating the retrieval of model details from this function to make its purpose clearer and more consistent.

mrDzurb · 2025-08-28T20:42:54Z

ads/aqua/shaperecommend/shape_report.py

@@ -30,6 +41,10 @@ class RequestRecommend(BaseModel):
        COMPARTMENT_OCID, description="The OCID of user's compartment"
    )

+    deployment_config: Optional[AquaDeploymentConfig] = Field(


I think it would be better to do: default=None

mrDzurb · 2025-08-28T20:52:36Z

ads/aqua/shaperecommend/recommend.py

+                )
+
+            else:
+                data = self._get_model_config(ds_model)


Not sure why would we need to repeat same lines again?

mrDzurb · 2025-08-28T23:32:53Z

tests/unitary/with_extras/aqua/test_data/recommend/service-config/result-example_2.json

+                {
+                    "deployment_params": {
+                        "quantization": "mxfp4",
+                        "max_model_len": null,


This might be confusing, that in one case we say that max_model_len is null, but in params it is 130000

mrDzurb · 2025-08-28T23:38:07Z

ads/aqua/shaperecommend/recommend.py

+            if model:
+                model_size = str(model.total_model_gb)
+            else:
+                model_size = "Using Pre-Defined Config"


Maybe we should use - instead? Should we add total model gb param in the service configs?

mrDzurb · 2025-08-28T23:39:51Z

ads/aqua/shaperecommend/shape_report.py

@@ -42,7 +57,9 @@ class DeploymentParams(BaseModel):  # noqa: N801
    quantization: Optional[str] = Field(
        None, description="Type of quantization (e.g. 4bit)."
    )
+    max_model_len: Optional[int] = Field(None, description="Maximum length of input sequence.")


Merge conflict?

mrDzurb · 2025-08-28T23:40:45Z

ads/aqua/shaperecommend/shape_report.py

@@ -231,3 +255,89 @@ class ShapeRecommendationReport(BaseModel):
        None,
        description="Details for troubleshooting if no shapes fit the current model.",
    )
+
+    @classmethod
+    def from_deployment_config(cls, deployment_config: AquaDeploymentConfig, model_name: str, valid_shapes: List[ComputeShapeSummary]) -> "ShapeRecommendationReport":


Please use the formatter, to format the code.

mrDzurb · 2025-08-28T23:53:44Z

ads/aqua/modeldeployment/deployment.py

@@ -1297,6 +1297,9 @@ def recommend_shape(self, **kwargs) -> Union[Table, ShapeRecommendationReport]:
        AquaValueError
            If model type is unsupported by tool (no recommendation report generated)
        """
+        deployment_config = self.get_deployment_config(model_id=kwargs.get("model_id"))


Could you update the description and add acceptable params for kwargs?

mrDzurb · 2025-08-28T23:56:04Z

Could you also add telemetry for the shape recommender, we need to know if this feature is used by customers.

mrDzurb · 2025-08-28T23:57:46Z

ads/aqua/shaperecommend/recommend.py

@@ -93,6 +93,25 @@ def which_shapes(
            shapes = self.valid_compute_shapes(compartment_id=request.compartment_id)

            ds_model = self._validate_model_ocid(request.model_id)
+


Could you update the description for this method? Specifically input params.

mrDzurb · 2025-08-30T00:27:22Z

tests/unitary/with_extras/aqua/test_recommend.py

@@ -344,6 +365,8 @@ def test_which_shapes_valid(
        ],
    )
    def test_which_shapes_valid_from_file(


This test will potentially fail every time we update the GPU index json file.

elizjo and others added 19 commits August 8, 2025 15:40

inital code for GPU Shape Recommendator

f9e0a1d

modifications to handler

bab9f82

init implementation for gpu recommendations

89d9a3a

fixed docstrings and unused imports

c56882b

added unit tests

6b0f0a4

added rich diff table

f293271

fixed unit tests

7b321f5

addressed comments, fixed rich diff table

e61fae5

Adds shapes method to the OciDataScienceModelDeployment

2af1bcd

addressed comments

fbbf87e

fixed formatting

fcb5162

addressed comments

0d7ade4

fixed unit tests

32b8bcc

fixed formatting, bitsandbytes logic

82ae39b

fixed failed test case

0eff21e

remove print statement

cce4d90

Merge branch 'main' into ODSC-74228/GPU-Shape-Recommendation

389d050

Merge branch 'main' into ODSC-74228/GPU-Shape-Recommendation

4284885

init code for supporting service models

fb33939

elizjo requested review from darenr, mayoor, mrDzurb, VipulMascarenhas, qiuosier and ahosler as code owners August 21, 2025 18:08

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 21, 2025

elizjo changed the title ~~[GPU Shape Recommendation] Support for Service Managed Models~~ [WIP][GPU Shape Recommendation] Support for Service Managed Models Aug 21, 2025

elizjo changed the base branch from main to ODSC-74228/GPU-Shape-Recommendation August 21, 2025 18:08

elizjo changed the base branch from ODSC-74228/GPU-Shape-Recommendation to main August 21, 2025 18:09

started unit tests, added trust-remote-code param detection

4a2c63a

Merge branch 'main' into ODSC-76209/GPU-Shape-Recommendation

015aa56

mrDzurb changed the title ~~[WIP][GPU Shape Recommendation] Support for Service Managed Models~~ [WIP][AQUA][GPU Shape Recommendation] Support for Service Managed Models Aug 26, 2025

elizjo changed the base branch from main to ODSC-74228/GPU-Shape-Recommendation August 27, 2025 18:22

elizjo commented Aug 27, 2025

View reviewed changes

mrDzurb changed the base branch from ODSC-74228/GPU-Shape-Recommendation to main August 28, 2025 00:12

mrDzurb changed the base branch from main to ODSC-74228/GPU-Shape-Recommendation August 28, 2025 00:13

elizjo changed the base branch from ODSC-74228/GPU-Shape-Recommendation to main August 28, 2025 01:36

Merge branch 'main' into ODSC-76209/GPU-Shape-Recommendation

19de240

mrDzurb reviewed Aug 28, 2025

View reviewed changes

mrDzurb reviewed Aug 30, 2025

View reviewed changes

		tie_word_embeddings = LLMConfig.get_bool(raw, "tie_word_embeddings", True)

		trust_remote_code = "auto_map" in raw # trust-remote-code is always needed when this key is present

		@@ -93,6 +93,25 @@ def which_shapes(
		shapes = self.valid_compute_shapes(compartment_id=request.compartment_id)

		ds_model = self._validate_model_ocid(request.model_id)

		}

		DEFAULT_WEIGHT_SIZE = "bfloat16"

[WIP][AQUA][GPU Shape Recommendation] Support for Service Managed Models #1252

Are you sure you want to change the base?

[WIP][AQUA][GPU Shape Recommendation] Support for Service Managed Models #1252

Uh oh!

Conversation

elizjo commented Aug 21, 2025

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

github-actions bot commented Aug 25, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrDzurb commented Aug 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!