Skip to content

[AQUA Telemetry] Update MD Tracking #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 18, 2025

Conversation

agrimk
Copy link
Member

@agrimk agrimk commented May 22, 2025

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 22, 2025
Copy link

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-19.13%

Copy link

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-19.13%

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments.

Can we add some unit tests, and also set up pre-commit hook to format the code changes during commit? Also, can you also add some examples of logging events of both successful and unsuccessful in the description?

deployment_id = deployment.id


deployment_id = deployment.id()
Copy link
Member

@VipulMascarenhas VipulMascarenhas May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is a property, keep it as .id instead else it will result in TypeError

@@ -38,6 +38,7 @@ def __init__(
config: dict = None,
signer: Signer = None,
client_kwargs: dict = None,
_error_message: str = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a parameter called _error_message here? We can have it inside init directly:

self._error_message = None



def get_deployment_status(self,model_deployment_id: str, work_request_id : str, model_type : str) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstrings

@@ -54,6 +54,8 @@
SERVICE_MANAGED_CONTAINER_URI_SCHEME = "dsmc://"
SUPPORTED_FILE_FORMATS = ["jsonl"]
MODEL_BY_REFERENCE_OSS_PATH_KEY = "artifact_location"
DEFAULT_WAIT_TIME = 1200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these constants are specific to model deployment status, we could have it in the ads.aqua.modeldeployment.constants.py.

@@ -80,6 +84,9 @@
from ads.model.model_metadata import ModelCustomMetadataItem
from ads.telemetry import telemetry

THREAD_POOL_SIZE = 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this constant in ads.aqua.modeldeployment.constants.py

@@ -80,6 +84,9 @@
from ads.model.model_metadata import ModelCustomMetadataItem
from ads.telemetry import telemetry

THREAD_POOL_SIZE = 16
thread_pool = concurrent.futures.ThreadPoolExecutor(max_workers=THREAD_POOL_SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the common telemetry or the common decorator threadpool instead of creating one here.

category=f"aqua/{model_type}/deployment/status",
action="FAILED",
detail="Error creating model deployment"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should log the error message (_error_message) coming from work request here instead of a static message. This will be used to track the specific reasons why the deployment failed.

)

self.telemetry.record_event_async(
Copy link
Member

@VipulMascarenhas VipulMascarenhas May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be within else block? We can use try-except-else here.

category=f"aqua/{model_type}/deployment/status",
action="SUCCEEDED",
detail=" Create model deployment successful",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can skip this detail, action "SUCCEEDED" implies the same thing.

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good, added some minor comments. Can you run pre-commit hook to take care of some formatting issues?

pip install pre-commit
run pre-commit install

self.telemetry.record_event_async(
category=f"aqua/{model_type}/deployment/status",
action="FAILED",
detail=data_science_work_request._error_message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can _error_message be None for any reason? Might be good to do detail=data_science_work_request._error_message or UNKNOWN to avoid unforeseen issues in telemetry logging.

@@ -78,6 +79,7 @@ def _sync(self):
self._percentage= work_request.percent_complete
self._status = work_request.status
self._description = work_request_logs[-1].message if work_request_logs else "Processing"
if work_request.status == 'FAILED' : self._error_message = self.client.list_work_request_errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to show an example output for failed and successful MD in the PR description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's use ruff formatter to format the code.

@mrDzurb mrDzurb changed the title Odsc 70841 update md tracking [AQUA Telemetry] Update MD Tracking Jun 4, 2025
)
except Exception as e:
logger.error("Error while trying to create model deployment: " + str(e))
print("Error while trying to create model deployment: " + str(e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's cleanup the print statement.

@@ -78,6 +79,7 @@ def _sync(self):
self._percentage= work_request.percent_complete
self._status = work_request.status
self._description = work_request_logs[-1].message if work_request_logs else "Processing"
if work_request.status == 'FAILED' : self._error_message = self.client.list_work_request_errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's use ruff formatter to format the code.

@@ -81,7 +81,8 @@ def record_event(
# Here `endpoint`` is for debugging purpose
# For some federated/domain users, the `endpoint` may not be a valid URL
endpoint = f"{self.service_endpoint}/n/{self.namespace}/b/{self.bucket}/o/telemetry/{category}/{action}"
logger.debug(f"Sending telemetry to endpoint: {endpoint}")
logger.info(f"Sending telemetry to endpoint: {endpoint}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use debug instead

Copy link

github-actions bot commented Jun 6, 2025

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-18.94%

Copy link

github-actions bot commented Jun 9, 2025

📌 Cov diff with main:

Coverage-3%

📌 Overall coverage:

Coverage-18.93%

Copy link

📌 Cov diff with main:

Coverage-3%

📌 Overall coverage:

Coverage-18.93%

Copy link

📌 Cov diff with main:

Coverage-2%

📌 Overall coverage:

Coverage-18.80%

mrDzurb
mrDzurb previously approved these changes Jun 18, 2025
@@ -8,3 +8,6 @@

This module contains constants used in Aqua Model Deployment.
"""

DEFAULT_WAIT_TIME = 12000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in SEC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an extra zero here, should have been 1200 :)

# {l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, ' '{rate_fmt}{postfix}]
# customize the bar format to remove the {n_fmt}/{total_fmt} from the right side
DEFAULT_BAR_FORMAT = '{l_bar}{bar}| [{elapsed}<{remaining}, ' '{rate_fmt}{postfix}]'
DEFAULT_BAR_FORMAT = "{l_bar}{bar}| [{elapsed}<{remaining}, " "{rate_fmt}{postfix}]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it be just

"{l_bar}{bar}| [{elapsed}<{remaining}, {rate_fmt}{postfix}]"

?

@mrDzurb
Copy link
Member

mrDzurb commented Jun 18, 2025

Please, don't use Jira ticket as a reference. It would be better to add a description in the PR by itself.

Copy link

📌 Cov diff with main:

Coverage-2%

📌 Overall coverage:

Coverage-18.80%

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the imports for MultiModelDeploymentConfigLoader seem to be incorrect, it should be from config_loader and not utils.

@mrDzurb
Copy link
Member

mrDzurb commented Jun 18, 2025

@agrimk Could you aslo add a couple of the screenshots with the test results?

Copy link

📌 Cov diff with main:

Coverage-3%

📌 Overall coverage:

Coverage-18.80%

Copy link

📌 Cov diff with main:

Coverage-66%

📌 Overall coverage:

Coverage-58.28%

@VipulMascarenhas VipulMascarenhas merged commit dee3591 into main Jun 18, 2025
24 checks passed
@VipulMascarenhas VipulMascarenhas deleted the ODSC-70841_update_md_tracking branch June 18, 2025 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants