-
Notifications
You must be signed in to change notification settings - Fork 56
[WIP][AQUA][GPU Shape Recommendation] Support for Service Managed Models #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
) | ||
if request.deployment_config: | ||
shape_recommendation_report = ( | ||
ShapeRecommendationReport.from_deployment_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we check if a deployment_config was successfully obtained, and if so we will generate the report immediately.
} | ||
|
||
DEFAULT_WEIGHT_SIZE = "bfloat16" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we already have such variable, three lines below?
@@ -131,6 +131,10 @@ def construct_deployment_params(self) -> str: | |||
# vLLM only supports 4bit in-flight quantization | |||
params.append(VLLM_PARAMS["in_flight_quant"]) | |||
|
|||
# add trust-remote-code if custom modules are specified | |||
if c.trust_remote_code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use more meaningful name for the config variable?
c
->llm_config
?
@@ -207,6 +210,17 @@ def validate_model_support(cls, raw: dict) -> ValueError: | |||
"Encoder-decoder models (ex. T5, Gemma) and encoder-only (BERT) are not supported at this time." | |||
) | |||
|
|||
@staticmethod | |||
def get_bool(raw, key, default=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ads/coomon/utils
we already have - parse_bool
function, maybe this function can be reused?
tie_word_embeddings = LLMConfig.get_bool(raw, "tie_word_embeddings", True) | ||
|
||
trust_remote_code = "auto_map" in raw # trust-remote-code is always needed when this key is present | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more description for the section below? Can the error be more specific?
if None in [
num_hidden_layers,
hidden_size,
vocab_size,
num_attention_heads,
head_dim,
]:
raise ValueError("Missing required value in model config.")
@@ -93,6 +93,25 @@ def which_shapes( | |||
shapes = self.valid_compute_shapes(compartment_id=request.compartment_id) | |||
|
|||
ds_model = self._validate_model_ocid(request.model_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this method also retrieves model details, but the function name doesn’t suggest that it returns the model config as well. I’d recommend separating the retrieval of model details from this function to make its purpose clearer and more consistent.
@@ -30,6 +41,10 @@ class RequestRecommend(BaseModel): | |||
COMPARTMENT_OCID, description="The OCID of user's compartment" | |||
) | |||
|
|||
deployment_config: Optional[AquaDeploymentConfig] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to do: default=None
) | ||
|
||
else: | ||
data = self._get_model_config(ds_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why would we need to repeat same lines again?
{ | ||
"deployment_params": { | ||
"quantization": "mxfp4", | ||
"max_model_len": null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be confusing, that in one case we say that max_model_len is null, but in params it is 130000
if model: | ||
model_size = str(model.total_model_gb) | ||
else: | ||
model_size = "Using Pre-Defined Config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use -
instead? Should we add total model gb param in the service configs?
@@ -42,7 +57,9 @@ class DeploymentParams(BaseModel): # noqa: N801 | |||
quantization: Optional[str] = Field( | |||
None, description="Type of quantization (e.g. 4bit)." | |||
) | |||
max_model_len: Optional[int] = Field(None, description="Maximum length of input sequence.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge conflict?
@@ -231,3 +255,89 @@ class ShapeRecommendationReport(BaseModel): | |||
None, | |||
description="Details for troubleshooting if no shapes fit the current model.", | |||
) | |||
|
|||
@classmethod | |||
def from_deployment_config(cls, deployment_config: AquaDeploymentConfig, model_name: str, valid_shapes: List[ComputeShapeSummary]) -> "ShapeRecommendationReport": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the formatter, to format the code.
@@ -1297,6 +1297,9 @@ def recommend_shape(self, **kwargs) -> Union[Table, ShapeRecommendationReport]: | |||
AquaValueError | |||
If model type is unsupported by tool (no recommendation report generated) | |||
""" | |||
deployment_config = self.get_deployment_config(model_id=kwargs.get("model_id")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the description and add acceptable params for kwargs?
Could you also add telemetry for the shape recommender, we need to know if this feature is used by customers. |
@@ -93,6 +93,25 @@ def which_shapes( | |||
shapes = self.valid_compute_shapes(compartment_id=request.compartment_id) | |||
|
|||
ds_model = self._validate_model_ocid(request.model_id) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the description for this method? Specifically input params.
@@ -344,6 +365,8 @@ def test_which_shapes_valid( | |||
], | |||
) | |||
def test_which_shapes_valid_from_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will potentially fail every time we update the GPU index json file.
For AQUA Service models, pre-defined configuration files already exist and can be directly used. This PR adds support for service models via obtaining their pre-defined config files, parsing this metadata, and formatting a response consistent with the current output of the GPU Shape Recommendation Tool
Returns

Returns