Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlops-multi-account-cdk template: batch transform option #61

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

marymaj
Copy link
Contributor

@marymaj marymaj commented Jan 20, 2023

Issue #, if available:

Description of changes:

This PR adds capability to create Batch Transform SageMaker Pipeline as an alternative to deployment with SageMaker real-time Endpoint.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

id=f"{prj_name}-source_scripts",
destination_bucket=i_bucket,
destination_key_prefix=f"{self.pipeline_name}/{self.timestamp}/source-scripts",
sources=[s3_deployment.Source.asset(path=f"{BASE_DIR}/source_scripts")],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion, the source scripts seem to be missing in this deploy repository. I'll test as soon as you add them :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source scripts added

Copy link

@pasiunaite pasiunaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of issues in the inference pipeline:

  • get_approved_package_desc function is not implemented in get_approved_package.py, thus the import in batch_inference_pipeline.py fails
  • data quality step is added to the inference pipeline but the baseline calculation is missing


'''
from deploy_endpoint.get_approved_package import get_approved_package_desc
spec = get_approved_package_desc()
Copy link

@pasiunaite pasiunaite Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_approved_package_desc() function is not implemented in the get_approved_package.py file, so this import fails

'''

step_process = self.get_process_step()
step_data_quality = self.get_data_quality_step(step_process)
Copy link

@pasiunaite pasiunaite Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data quality baseline calculation step is not in the training pipeline, so the data quality check in the inference pipeline fails. Please add the data quality baseline calculation to the repo

logger = logging.getLogger(__file__.split('/')[-1])
logger.setLevel(getenv("LOGLEVEL", "INFO"))

def upload_assets_to_s3(account_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Maria, do we actually need that part?
Can we just keep the pipeline definition and code in the dev account and just point to that file in "deploy_endpoint_stack"?
The fact of copying the pipeline definitions and scripts could be avoided as long as the Role for preprod and prod has access to the s3 bucket in the dev account

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants