Feature/multi model artifact handler #869

YashPandit4u · 2024-05-27T19:49:53Z

Adding a new class to easily handle multi-model artifact creation in ADS.

Signed-off-by: Yash Pandit <[email protected]>

oracle-contributor-agreement · 2024-05-27T19:49:59Z

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

PR author: YashPandit4u

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

ads/model/model_description.py

…ion removed.

….com/YashPandit4u/accelerated-data-science into feature/multi-model-artifact-handler

VipulMascarenhas

left a few comments, I think we should be reusing most of the implementation from DataScienceModel class instead of recreating it in here. A high level design would look like:

class DataScienceModelCollection(DataScienceModel):

    def __init__():
        super().__init__(**kwargs)
        # no need to add model self.modeldescriptionjson since this attribute is already present in DataScienceModel
    
    def add():
        ...
        
    def remove():
        ...
        
    def show():
        ...

The usage would look like:

    >>> model = (DataScienceModelCollection()
    ...    .with_compartment_id()
    ...    .with_project_id()
    ...    .with_display_name()
    ...    .with_description()
    ...    .with_freeform_tags()
    ...    .with_artifact("model1_path", "model2_path")

    >>> model.add(...)
    >>> model.remove(...)
    >>> model.create(model_by_reference=True)

On second thought, one other option is to have add(), remove() within DataScienceModel itself, then we don't need the collection class. The add/remove functions are only manipulating the model_file_description property that represents multiple models, so having a separate subclass may not be required.

any thoughts on this?

VipulMascarenhas · 2024-06-03T22:11:28Z

ads/model/model_description.py

+# from ads.common import logger
+import logging
+
+logger = logging.getLogger("ads.model_description")


can we reuse the logger from ads.common?

When I used logger from ads.common I don't see any info messages and for some methods in this class I want to show info messages.

VipulMascarenhas · 2024-06-03T22:17:40Z

ads/model/model_description.py

+        # Remove if the model already exists
+        self.remove(namespace, bucket, prefix)
+
+        def checkIfFileExists(fileName):


can we reuse is_path_exists from ads.common.utils instead?

VipulMascarenhas · 2024-06-03T23:30:19Z

ads/model/model_description.py

+            return isExists
+
+        # Function to un-paginate the api call with while loop
+        def listObjectVersionsUnpaginated():


ads.common.object_storage_details.list_object_versions can be used instead?

ads/model/model_description.py

VipulMascarenhas · 2024-06-03T23:41:16Z

ads/model/model_description.py

+                logger.error(f"An unexpected error occurred: {e}")
+                raise e
+
+    def add(self, namespace, bucket, prefix=None, files=None):


recommend using type hinting for all the functions, for example:
def add(self, namespace: str, bucket: str, ... ): -> None

Added type hints.

ads/model/model_description.py

VipulMascarenhas · 2024-06-03T23:44:50Z

ads/model/model_description.py

+        display_name = (
+            "Created by MMS SDK on "
+            + datetime.datetime.now(pytz.utc).strftime("%Y-%m-%d %H:%M:%S %Z")
+            if (display_name == None)


nit: use if display_name: instead?

The suggested statement will trigger even when we have blank string.

ads/model/model_description.py

Abh1navKumar · 2024-06-04T04:56:39Z

ads/model/model_description.py

+                logger.error(f"An unexpected error occurred: {e}")
+                raise e
+
+    def add(self, namespace, bucket, prefix=None, files=None):


ads/model/model_description.py

mrDzurb

I may have misunderstood what we're trying to achieve in this PR, but wouldn't a user experience like the following be more streamlined?

model_description = ModelDescription().add("oci://bucket_name@namespace/prefix1", "oci://bucket_name@namespace/person_prefix2")

Here, ModelDescription acts as a builder for the JSON configuration needed.
It still can contain all the introduced methods in this PR: add, remove, build, print...

I would prefer instead of .build to use .to_json(uri="path/to/the/json") where if user don't provide uri then JSON should be returned as a result of the function. Also adding .to_dict() might be useful as well.
In this case we don't need .show() method.

We could then enhance the DatascienceModel class with:

DatascienceModel().with_model_description(model_description).create()

This approach leverages the existing DatascienceModel class, which already supports CRUD operations for models, so we wouldn't need to replicate these in the ModelDescription class. The purpose of ModelDescription would be to simplify the preparation of the correct JSON configuration for users.

Perhaps we could consider a clearer name for clarity, as "ModelDescription" might be somewhat confusing. How about using "ModelArtifactConfigBuilder" to more accurately reflect its purpose as a tool for constructing model artifact configurations?

Then user experience would be

model_artifact_config = ModelArtifactConfigBuilder().add().add()

DatascienceModel().with_model_artifact(model_artifact_config).create()

cc: @mayoor @VipulMascarenhas

…odel

YashPandit4u · 2024-06-05T10:21:37Z

Hi @VipulMascarenhas ,

I have now extended from the class already present in ads and the usage of the utility looks like this now:

import sys
import ads.common
from ads.model.model_collection import DataScienceModelCollection
import ads
import json

ads.set_auth(auth="security_token")

mdes = DataScienceModelCollection()
.with_ref_model_id(model_ocid="ocid1.datasciencemodel.oc1.iad.amaaaaaav66vvnialtuoq5y4ltdz26bxhlow4uh5w544ullowe3jl4vwuayq")
.with_display_name("New Design 8")
.with_compartment_id("ocid1.tenancy.oc1..aaaaaaaahzy3x4boh7ipxyft2rowu2xeglvanlfewudbnueugsieyuojkldq")
.with_project_id("ocid1.datascienceproject.oc1.iad.amaaaaaav66vvniaqsyu2nufljutkn4rzth2nz4q3zqslirct7eayl5ojpma")

mdes.add(namespace="ociodscdev", bucket="unzip-multi-model", prefix="model-linear-1")
mdes.remove(namespace="ociodscdev", bucket="unzip-multi-model", prefix="model-linear-1")

mdes.create()

sys.exit(0)

Please take a look when free and give your comments.

Thanks and Regards
Yash Pandit

ads/model/model_collection.py

mrDzurb · 2024-06-05T16:28:36Z

ads/model/model_collection.py

+from typing import List, Optional
+import logging
+
+logger = logging.getLogger("ads.model_description")


Wouldn't this be better - logger = logging.getLogger(__name__)?

mrDzurb · 2024-06-05T16:33:33Z

ads/model/model_collection.py

+        # Remove if the model already exists
+        self.remove(namespace, bucket, prefix)
+
+        def check_if_file_exists(fileName):


Please double check if the existing utils methods are already implemented such functionality.

VipulMascarenhas

@YashPandit4u suggested a few changes, just needs to be re-organized to make it consistent with ads usage.

ads/model/model_collection.py

VipulMascarenhas · 2024-06-11T00:33:01Z

ads/model/model_collection.py

+            raise e
+        return self
+
+    def add(


let's rename this to add_artifact(uri:str, files=Optional[List[str]]) which takes in uri and files list as input to be consistent with how paths are referred to within ads.

The user won't have uri handy, hence we have parameters like namespace, and bucket to make it easy.
Will it be possible to keep these parameters? The function signature would look like:

def add_artifact( self, namespace: str, bucket: str, prefix: Optional[str] = None, files: Optional[List[str]] = None, )

isn't the uri same as f"oci://{bucketName}@{namespace}/{prefix}", all of which are added as input to this function. My concern is that this input makes it incosistent with all other DataScienceModel or GenericModel operations in ads and we wouldn't be able to change it later on.
@mrDzurb what do you think?

VipulMascarenhas · 2024-06-11T00:39:00Z

ads/model/model_collection.py

+                "objects": objects,
+            }
+        )
+        self.set_spec(self.CONST_MODEL_FILE_DESCRIPTION, tmp_model_file_description)


Can we modify _prepare_file_description_artifact with optional params for files list and content json? This replaces the above functions check_if_file_exists and list_obj_versions_unpaginated altogether.

If content is passed, then no need to create the dict within that function.
for example:

if not content: content = dict() content["version"] = MODEL_BY_REFERENCE_VERSION ...

In this case _prepare_file_description_artifact would take in inputs like def _prepare_file_description_artifact(bucket_uri: list, content: Optional[Dict] = None, files: Optional[List[List[str]]] = None).

The inputs for files isn't ideal, it might be good to implement a dataclass for this and pass a List[ModelFileDescriptionArtifact] instead of bucket_uri list.

@dataclass class ModelFileDescriptionArtifact: bucket_uri: str files: List[str] = None

This isn't urgent though, we can update this later on.

Thanks for this suggestion.
Will it be okay if we do this in the next PR due to customer deadlines?

sure, let's mark a todo somewhere to make sure of this.

VipulMascarenhas · 2024-06-11T00:39:47Z

ads/model/model_collection.py

+        )
+        self.set_spec(self.CONST_MODEL_FILE_DESCRIPTION, tmp_model_file_description)
+
+    def remove(self, namespace: str, bucket: str, prefix: Optional[str] = None):


same as add_artifact - rename this to remove_artifact(uri:str) which takes in uri to be consistent with how paths are referred to within ads.

ads/model/model_collection.py

VipulMascarenhas · 2024-06-11T00:41:09Z

ads/model/model_collection.py

+            # model found case
+            self.model_file_description["models"].pop(modelSearchIdx)
+
+    def create(self):


this isn't needed since we can just the existing create method.

Yes removed this.

ads/model/model_collection.py

VipulMascarenhas · 2024-06-11T01:06:45Z

ads/model/model_collection.py

+        logger.info("Model Artifact stored successfully.")
+        return os.path.abspath(file_path)
+
+    def show(self) -> str:


this is covered by model.model_file_description property.

Yes removing this.

VipulMascarenhas · 2024-06-12T00:42:54Z

@YashPandit4u thanks for making all the changes here, looks good overall. One concern I have is around the path inputs, it would be good to have the oci path like I mentioned in the comments above as user can construct the path easily with the params (bucket, namespace and prefix). Also, could you please add a couple unit tests in test_datascience_model.py for this change?

cc: @mrDzurb

ads/model/model_collection.py

mrDzurb · 2024-06-12T00:45:49Z

ads/model/datascience_model.py

+
+    def add_artifact(
+        self,
+        namespace: str,


For consistency it would be better to accept OCIFS "oci://" path.

Have changed both functions add_artifact and remove_artifact with this uri type input.

ads/model/datascience_model.py

mrDzurb · 2024-06-12T00:51:29Z

ads/model/datascience_model.py

+            self.set_spec(self.CONST_MODEL_FILE_DESCRIPTION, self.empty_json)
+
+        # Get object storage client
+        authData = default_signer()


Technically the auth object is already constructed and accessible by

self.dsc_model.auth

Thanks, using this now.

….com/YashPandit4u/accelerated-data-science into feature/multi-model-artifact-handler

VipulMascarenhas

@YashPandit4u thanks for the update. There are a few improvements to make for the next update:

check_if_file_exists and list_obj_versions_unpaginated should reuse existing methods.
_extract_oci_uri_components wasn't required, as mentioned in previous comments we could just use ObjectStorageDetails.from_path(uri) to get bucket name, namespace and prefix. If we want to add validation ObjectStorageDetails has a few static methods like is_valid_uri to do this.

YashPandit4u · 2024-06-13T17:38:58Z

Hi @VipulMascarenhas , thanks for telling the improvement items.

I have now removed the "_extract_oci_uri_components" method and used the already present ObjectStorageDetails.from_path(uri) to get the namespace, prefix, and filepath. Thanks for the suggestion.

VipulMascarenhas

lgtm 👍

YashPandit4u · 2024-06-14T19:10:51Z

Hi @VipulMascarenhas ,
Based on the management decision, I have reverted the method signatures to the ones suggested by the customer.

…emove_artifact methods

YashPandit4u · 2024-06-15T10:38:21Z

Hi @VipulMascarenhas ,

I have now added options for both uri and (namespace, bucket, prefix) in add_artifact, remove_artifact.

mrDzurb · 2024-06-17T17:26:06Z

ads/model/datascience_model.py

@@ -1466,3 +1467,226 @@ def _download_file_description_artifact(self) -> Tuple[Union[str, List[str]], in
            bucket_uri.append(uri)

        return bucket_uri[0] if len(bucket_uri) == 1 else bucket_uri, artifact_size
+
+    def add_artifact(


NIT: Could you please add an examples section for the add and delete artifacts feature?

mrDzurb · 2024-06-17T17:27:35Z

ads/model/datascience_model.py

+                if object_storage_details.filepath == ""
+                else object_storage_details.filepath
+            )
+        if (not namespace) or (not bucket):


NIT: if not (namespace and bucket):

mrDzurb · 2024-06-17T17:29:55Z

ads/model/datascience_model.py

+                else object_storage_details.filepath
+            )
+        if (not namespace) or (not bucket):
+            raise ValueError("Both 'namespace' and 'bucket' must be provided.")


NIT: Wouldn't it be better to say something like - Artifacts cannot be added. Both 'namespace' and 'bucket' parameters must be provided.. otherwise it would be hard for user to understand where they need to provide the namespace and bucket.

mrDzurb · 2024-06-17T17:33:19Z

ads/model/datascience_model.py

+        - If no objects are found to add to the model description, a ValueError is raised.
+        """
+
+        if uri and (namespace or bucket):


Maybe we can do something like this?

if uri and (namespace or bucket): raise ValueError("Artifacts cannot be added. Please provide either a 'uri' alone, or 'namespace' and 'bucket' together, but not 'uri' with 'namespace' or 'bucket'.") elif not (uri or (namespace and bucket)): raise ValueError("Artifacts cannot be added. You must provide either a 'uri' or both 'namespace' and 'bucket'.")

mrDzurb · 2024-06-17T17:35:50Z

ads/model/datascience_model.py

+        # Remove if the model already exists
+        self.remove_artifact(namespace=namespace, bucket=bucket, prefix=prefix)
+
+        def check_if_file_exists(fileName):


NIT: I would not recommend to create nested functions, it would be hard to write unit tests for such methods. I would rather recommend to move such methods into utils or outside the class.

mrDzurb · 2024-06-17T17:38:01Z

ads/model/datascience_model.py

+        ]
+
+        if len(objects) == 0:
+            error_message = (


NIT:

error_message = ( f"No files found to add in the bucket '{bucket}' within the namespace '{namespace}' " f"and prefix '{prefix}'. Expected file names: {files}" )

mrDzurb · 2024-06-17T17:38:24Z

ads/model/datascience_model.py

+        )
+        self.set_spec(self.CONST_MODEL_FILE_DESCRIPTION, tmp_model_file_description)
+
+    def remove_artifact(


Please add the examples section.

mrDzurb · 2024-06-17T17:39:04Z

ads/model/datascience_model.py

+            - If the model description JSON is None.
+        """
+
+        if uri and (namespace or bucket):


See the comments for the add_artifact method.

changes requested were implemented, pending items will be covered in subsequent update.

YashPandit4u added 2 commits May 28, 2024 01:11

Initial commit

6d9fa4a

Signed-off-by: Yash Pandit <[email protected]>

rename in main module file

6a930f1

Signed-off-by: Yash Pandit <[email protected]>

YashPandit4u requested review from darenr, mayoor, mrDzurb, VipulMascarenhas and qiuosier as code owners May 27, 2024 19:49

oracle-contributor-agreement bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label May 27, 2024

darenr previously requested changes May 29, 2024

View reviewed changes

YashPandit4u and others added 6 commits June 1, 2024 20:27

Logger used for prints, error handling improved, one extra file creat…

5f3b316

…ion removed.

Merge branch 'oracle:main' into feature/multi-model-artifact-handler

802e438

Reformatted using black.

35e8464

Merge branch 'feature/multi-model-artifact-handler' of https://github…

e625107

….com/YashPandit4u/accelerated-data-science into feature/multi-model-artifact-handler

Separate logger used.

74a235e

Added python docs for all methods.

848972e

oracle-contributor-agreement bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Jun 1, 2024

VipulMascarenhas reviewed Jun 4, 2024

View reviewed changes

Abh1navKumar suggested changes Jun 4, 2024

View reviewed changes

mrDzurb reviewed Jun 5, 2024

View reviewed changes

YashPandit4u added 3 commits June 5, 2024 15:43

Added class DataScienceModelCollection that extends from DataScienceM…

d35b917

…odel

removed old model description class

3ac5338

formatted using black

822c7e8

Abh1navKumar reviewed Jun 5, 2024

View reviewed changes

ads/model/model_collection.py Outdated Show resolved Hide resolved

Abh1navKumar reviewed Jun 5, 2024

View reviewed changes

ads/model/model_collection.py Outdated Show resolved Hide resolved

black formatter used and one return type added.

3d4d950

mrDzurb reviewed Jun 5, 2024

View reviewed changes

VipulMascarenhas reviewed Jun 11, 2024

View reviewed changes

Removed new added class.

c40ea8b

mrDzurb reviewed Jun 12, 2024

View reviewed changes

YashPandit4u and others added 7 commits June 13, 2024 16:31

Added uri based approach

d0309f4

Added unit tests.

19ec921

Changed the pydocs according to ads specifications

9089028

Merge branch 'main' into feature/multi-model-artifact-handler

464cb37

replaces regex with normal splitting for uri

49be8f8

Merge branch 'feature/multi-model-artifact-handler' of https://github…

a2894a5

….com/YashPandit4u/accelerated-data-science into feature/multi-model-artifact-handler

removed default_signer

5d327cd

VipulMascarenhas previously approved these changes Jun 13, 2024

View reviewed changes

Used ObjectStorageDetails.from_path(uri) for url decoding.

b7e7d74

YashPandit4u dismissed VipulMascarenhas’s stale review via b7e7d74 June 13, 2024 17:39

VipulMascarenhas previously approved these changes Jun 13, 2024

View reviewed changes

namespace, bucket, prefix way added again.

4418bf8

YashPandit4u dismissed VipulMascarenhas’s stale review via 4418bf8 June 14, 2024 18:56

Merge branch 'main' into feature/multi-model-artifact-handler

e2e7099

YashPandit4u added 4 commits June 15, 2024 15:37

Given options for both uri and (namespace, bucket) in add_artifact, r…

4cc8823

…emove_artifact methods

Updated python docs

7f3cc7f

Ran black formatter.

32f4826

Ran black formatter of UTs.

9f69e35

YashPandit4u added 2 commits June 15, 2024 16:31

Added uri UTs.

ad0ce2e

Prefix null check added.

eec2b4d

mrDzurb approved these changes Jun 17, 2024

View reviewed changes

VipulMascarenhas approved these changes Jun 17, 2024

View reviewed changes

VipulMascarenhas merged commit 30534f7 into oracle:main Jun 18, 2024
20 checks passed

Feature/multi model artifact handler #869

Feature/multi model artifact handler #869

Uh oh!

Conversation

YashPandit4u commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oracle-contributor-agreement bot commented May 27, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VipulMascarenhas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrDzurb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YashPandit4u commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VipulMascarenhas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YashPandit4u commented May 27, 2024 •

edited

Loading

VipulMascarenhas left a comment •

edited

Loading

mrDzurb left a comment •

edited

Loading

YashPandit4u commented Jun 5, 2024 •

edited

Loading