Skip to content

Commit a6797e4

Browse files
authored
Major upgrade to include StepFunction workflow in code pipeline. (#26)
* Adding new pipeline resources for sfn workflow * Update run function and pipeline to load workflow inputs * Adding new workflow notebook * Moving retraining into monitoring section * Add back custom resource for model monitoring * Updating model run arguments * Adding git commit and data version * Updating workflow to provide input build artifact * Update to pass deploy and sagemaker role to run * Minor update to run.py * Fix for model output location * Update mlops notebook to include step functions render * Update pipeline image to reference step functions * Updates to the readme for updated sfn quickstart link * Adding additional permission for code deploy * Change step name to update workflow * Update baseline job name to match expected format * Fix for training job name, and resource for model-monitoring, and put metric data * Moving notification topic to pipeline, and passing into prod deploy * Add schedule rules that is disabled by default * Update stepfunctions==2.0.0rc1 * Adding SNS publish permission for code deploy * Updates to pipline parameters * Update to allow pass role for lambda required to start canary * Adding default branch to be sfn-workflow * Adding variant to deploy. Add env vars to build directly * Update canary runtime to syn-nodejs-2.0 * Moving lambda functions into custom resources, added additional functions for workflow * Adding batch transform and model monitoring baseline to workflow * Adding results paths * Minor update to use S3Downloader for transform output * Added blocking wait for workflow * Minor fix to step name * Adding mlops prefix to analytics search * Minor tweaks to sagemaker permissions and workflow ref * Updates to Readme.md (#21) * [fix] Replace underscore with dash in SageMaker CodeRepository name given it doesn't support underscore in name. (#20) * fixed typos, added rewording * update file directory structure in readme to match project directory structure Co-authored-by: Tom Liu <[email protected]> Co-authored-by: Pauline <[email protected]> * Minor updates * Increasing lambda timeout to 30s for pre+post build * Add CWE for triggering S3 and CodeCommit. Update custom resource to allow list schedule executions * Adding cleanup code to delete stacks and wait for them to complete * Fix drift section to push 2 metrics over 0.2 and get back stats * Fix to use codebuild identifier to get execution id. Added SNS TopicPolicy * Remove custom resource deletion from sagemaker role. Update SetupTraining step * Update to remove training job identifier from schedule name * Update to remove execution id from schedule name * Update drift alarm to trigger on one event. Enabling update for monitoring schedule * Fix metric drift code * Fix schedule monitoring permissions and resoruce id. Update SNS notification template * Fix to put back loop logic * Update to create the workflow in CFN instead of updating existing. Renamed template->packaged.yml for CFN. * Add delete for * Add tag for IAM role * Removed explicit tags from CFN, and saved to CFN input * Complete rewrite of the Jupyter notebook for better workshop support (#24) * Removed histogram and replaced matplotlib with seaborn for scatter plots * Added more explanation and images to Data Preparation section Fixed spelling errors. Added an explanation for the pipeline failing because the data source is missing (with screenshot). Added some stats calculation to the Data manipulation section which helps justify the code which removes outliers. Added some text to explain why we generate scatter plots in the data visualization section. Separated the data splitting and saving into its own section to avoid confusion with the data visualization section. Added an explanation of model monitor and why it requires a baseline data file to be saved at this stage. * Added more explanation to the Start model build section Added text to explain that the environment variables are set by a lifecycle configuration script. Added an explanation of the ZIP file required as the S3 source for CodePipeline. Clarified that the workflow notebook is optional and added a note about its purpose. * Small fixes in Start Model Build and explanations added to Inspect Training Job Fixed capitalization of Start Model Build to be consistent with the other section headers of the same level. Cleared output which I accidentally added in my previous commit. Removed a temporary workflow pipeline arn fix which I accidentally submitted in my previous commit. Renamed the Wait for Training Job section to Inspect Training Job to better reflect what happens in the section. Added explanations and screenshots to the Inspect Training Job section. * Updated plots and added explanations to the Inspect Training Job section Added an internal link to the end of the Inspect Training Job section, which allows users to quickly navigate back to this section if they want to compare the test results with the validation results. Added explanations and a screenshot to the Test Dev Deployment section. Changed the scatter plot from a pandas plot to a seaborn plot for consistency with previous sections. Changed the scatter plot to show all results instead of just the tail of a sorted list of results, which was causing confusion. * Expanded Approve Deployment to Production section, added explanation to Test Production Deployment section Renamed the Approve Prod Deployment section to Approve Deployment to Production for clarity. Expanded on the text explanations in the Approve Deployment to Production section. Added more explanations and a screenshot to the Test Production Deployment section. One TODO needs to be replaced in a future commit when I better understand what happens in that step. * Added code and permissions for deleting monitoring schedules Added permissions to the SageMaker role for listing monitoring schedules and deleting monitoring schedules. This is required for cleaning up because CloudFormation fails to delete the monitoring schedule automatically. Also added code to the Clean Up section of the notebook for deleting the monitoring schedules attached to the production endpoint. Removed stack names from the Clean Up section, which I believe was legacy. * Moved sagemaker-custom-resource to end of the Clean Up section because it gets recreated otherwise * Updated architecture diagram, added explanations to overview Significantly expanded the Overview section with a more detailed architecture diagram and explanations for each step of the process. Added an Info notice about the prerequisites for completing this demo. Added an explanation of what we mean by a ‘safe’ deployment and included links to documentation on canary deployment, least privilege. * Added more explanation and a screenshot to Model Monitor section Expanded the Inspect Model Monitor section with more explanations and links, as well as a screenshot of the CloudWatch alarm which users can expect to see. * Added more explanation and screenshots to CloudWatch Monitoring section Expanded the CloudWatch Monitoring section with more explanations, as well as screenshots of the CloudWatch canaries and dashboard pages. * Removed monitoring schedule code from Clean Up section Turns out that CloudFormation has trouble deleting the monitoring schedule attached to the production endpoint due to a permissions error. This means we won't need the code for removing the monitoring schedule in the notebook itself. We only have to fix the sagemaker-custom-resource IAM role. * Fixed broken markdown cell in mlops notebook * Various small adjustments (see description) Updated the alt text to be relevant for each image. Added clarification on the types of notifications sent by SNS. Added an explanation for the NaN results for the baseline job, and added a link to the section where the baseline output is explored. Removed my temporary fix for the workflow ARN problem. Added an explanation for the CodeDeploy component of the production deployment. Rewrote the explanations in the Test REST API Application section. Timed every step of the notebook and added the average waiting times to the explanations. * Small changes based on feedback * Added more cells to view code files while the participant waits (i.e. during model monitoring schedule and during endpoint deployment) * Updated the alarm screenshot and added another screenshot for traffic shifting * Clarified the explanations in several places based on feedback from Julian * Minor updates to notebook text * Update to set lifecycle policy correctly * Updating pipeline to add CloudTrial which is required for S3EventRule * Adding minor updates for workflow * Adding missing images * Minor text changes to cloud formation and re-training * Update images for pipeline and lambda deploy. Added external link image and updated content to better explain REST API testing * Minor update to split out Test REST API header * Minor content updates relating to SageMaker Endpoint and other tweaks * Updating readme to include additional resources * Adding Build section and restructure the headings * Additional minor updates * Spelling and language changes (#25) * Removed histogram and replaced matplotlib with seaborn for scatter plots * Added more explanation and images to Data Preparation section Fixed spelling errors. Added an explanation for the pipeline failing because the data source is missing (with screenshot). Added some stats calculation to the Data manipulation section which helps justify the code which removes outliers. Added some text to explain why we generate scatter plots in the data visualization section. Separated the data splitting and saving into its own section to avoid confusion with the data visualization section. Added an explanation of model monitor and why it requires a baseline data file to be saved at this stage. * Added more explanation to the Start model build section Added text to explain that the environment variables are set by a lifecycle configuration script. Added an explanation of the ZIP file required as the S3 source for CodePipeline. Clarified that the workflow notebook is optional and added a note about its purpose. * Small fixes in Start Model Build and explanations added to Inspect Training Job Fixed capitalization of Start Model Build to be consistent with the other section headers of the same level. Cleared output which I accidentally added in my previous commit. Removed a temporary workflow pipeline arn fix which I accidentally submitted in my previous commit. Renamed the Wait for Training Job section to Inspect Training Job to better reflect what happens in the section. Added explanations and screenshots to the Inspect Training Job section. * Updated plots and added explanations to the Inspect Training Job section Added an internal link to the end of the Inspect Training Job section, which allows users to quickly navigate back to this section if they want to compare the test results with the validation results. Added explanations and a screenshot to the Test Dev Deployment section. Changed the scatter plot from a pandas plot to a seaborn plot for consistency with previous sections. Changed the scatter plot to show all results instead of just the tail of a sorted list of results, which was causing confusion. * Expanded Approve Deployment to Production section, added explanation to Test Production Deployment section Renamed the Approve Prod Deployment section to Approve Deployment to Production for clarity. Expanded on the text explanations in the Approve Deployment to Production section. Added more explanations and a screenshot to the Test Production Deployment section. One TODO needs to be replaced in a future commit when I better understand what happens in that step. * Added code and permissions for deleting monitoring schedules Added permissions to the SageMaker role for listing monitoring schedules and deleting monitoring schedules. This is required for cleaning up because CloudFormation fails to delete the monitoring schedule automatically. Also added code to the Clean Up section of the notebook for deleting the monitoring schedules attached to the production endpoint. Removed stack names from the Clean Up section, which I believe was legacy. * Moved sagemaker-custom-resource to end of the Clean Up section because it gets recreated otherwise * Updated architecture diagram, added explanations to overview Significantly expanded the Overview section with a more detailed architecture diagram and explanations for each step of the process. Added an Info notice about the prerequisites for completing this demo. Added an explanation of what we mean by a ‘safe’ deployment and included links to documentation on canary deployment, least privilege. * Added more explanation and a screenshot to Model Monitor section Expanded the Inspect Model Monitor section with more explanations and links, as well as a screenshot of the CloudWatch alarm which users can expect to see. * Added more explanation and screenshots to CloudWatch Monitoring section Expanded the CloudWatch Monitoring section with more explanations, as well as screenshots of the CloudWatch canaries and dashboard pages. * Removed monitoring schedule code from Clean Up section Turns out that CloudFormation has trouble deleting the monitoring schedule attached to the production endpoint due to a permissions error. This means we won't need the code for removing the monitoring schedule in the notebook itself. We only have to fix the sagemaker-custom-resource IAM role. * Fixed broken markdown cell in mlops notebook * Various small adjustments (see description) Updated the alt text to be relevant for each image. Added clarification on the types of notifications sent by SNS. Added an explanation for the NaN results for the baseline job, and added a link to the section where the baseline output is explored. Removed my temporary fix for the workflow ARN problem. Added an explanation for the CodeDeploy component of the production deployment. Rewrote the explanations in the Test REST API Application section. Timed every step of the notebook and added the average waiting times to the explanations. * Small changes based on feedback * Added more cells to view code files while the participant waits (i.e. during model monitoring schedule and during endpoint deployment) * Updated the alarm screenshot and added another screenshot for traffic shifting * Clarified the explanations in several places based on feedback from Julian * Fixed spelling mistakes, duplicated text, and readability issues. * Upgrade boto3 and awscli in codebuild job * Update retraining link and exit canary loops when access denied for EE * Update cat filename * Fix SFN version at 2.19.0 to avoid issue introduced in aws/sagemaker-python-sdk@64371d3 * Update default branch to be master for merge. * Update CFN param to reference master branch.
1 parent d24e104 commit a6797e4

31 files changed

+3049
-683
lines changed

README.md

Lines changed: 55 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,38 +2,41 @@
22

33
## Introduction
44

5-
This is a sample solution to build a safe deployment pipeline for Amazon SageMaker. This example could be useful for any organization looking to operationalize machine learning with native AWS development tools such as AWS CodePipeline, AWS CodeBuild and AWS CodeDeploy.
5+
This is a sample solution to build a safe deployment pipeline for Amazon SageMaker. This example could be useful for any organization looking to operationalize machine learning with native AWS development tools such as AWS CodePipeline, AWS CodeBuild and AWS CodeDeploy.
66

7-
This solution provides as *safe* deployment by creating an AWS Lambda API that calls into an Amazon SageMaker Endpoint for real-time inference.
7+
This solution provides a *Blue/Green*, also known as an *Canary deployment*, by creating an AWS Lambda API that calls into an Amazon SageMaker Endpoint for real-time inference.
88

99
## Architecture
1010

11-
Following is a diagram of the continuous delivery stages in the AWS Code Pipeline.
11+
In the following diagram, you can view the continuous delivery stages of AWS CodePipeline.
1212

13-
1. Build Artifacts: Runs a AWS CodeBuild job to create AWS CloudFormation templates.
13+
1. Build Artifacts: Runs an AWS CodeBuild job to create AWS CloudFormation templates.
1414
2. Train: Trains an Amazon SageMaker pipeline and Baseline Processing Job
1515
3. Deploy Dev: Deploys a development Amazon SageMaker Endpoint
16-
4. Deploy Prod: Deploys an AWS API Gateway Lambda in front of Amazon SageMaker Endpoints using AWS CodeDeploy for blue/green deployment and rollback.
16+
4. Deploy Prod: Deploys an Amazon API Gateway endpoint, and AWS Lambda function in front of Amazon SageMaker Endpoints using AWS CodeDeploy for blue/green deployment and rollback.
1717

1818
![code-pipeline](docs/code-pipeline.png)
1919

2020
### Components Details
2121

22-
- [**AWS SageMaker**](https://aws.amazon.com/sagemaker/) – This solution uses SageMaker to train the model to be used and host the model at an endpoint, where it can be accessed via HTTP/HTTPS requests
23-
- [**AWS CodePipeline**](https://aws.amazon.com/codepipeline/) – CodePipeline has various stages defined in CloudFormation which step through which actions must be taken in which order to go from source code to creation of the production endpoint.
24-
- [**AWS CodeBuild**](https://aws.amazon.com/codebuild/) – This solution uses CodeBuild to build the source code from GitHub
25-
- [**AWS CloudFormation**](https://aws.amazon.com/cloudformation/) – This solution uses the CloudFormation Template language, in either YAML or JSON, to create each resource including custom resource.
26-
- [**AWS S3**](https://aws.amazon.com/s3/) – Artifacts created throughout the pipeline as well as the data for the model is stored in an Simple Storage Service (S3) Bucket.
22+
- [**AWS CodePipeline**](https://aws.amazon.com/codepipeline/) – CodePipeline has various stages defined in CloudFormation, which step through which actions must be taken in which order to go from source code to creation of the production endpoint.
23+
- [**AWS CodeBuild**](https://aws.amazon.com/codebuild/) – This solution uses AWS CodeBuild to build the source code from GitHub.
24+
- [**Amazon S3**](https://aws.amazon.com/s3/) – Artifacts created throughout the pipeline as well as the data for the model is stored in an Simple Storage Service (S3) Bucket.
25+
- [**AWS CloudFormation**](https://aws.amazon.com/cloudformation/) – This solution uses the AWS CloudFormation Template language, in either YAML or JSON, to create each resource including a custom resource.
26+
- [**AWS Step Functions**](https://aws.amazon.com/step-functions/) – This solutions creates AWS StepFunctions to orchestrate Amazon SageMaker training and processing jobs.
27+
- [**Amazon SageMaker**](https://aws.amazon.com/sagemaker/) – This solution uses Amazon SageMaker to train and deploy the machine learning model.
28+
- [**AWS CodeDeploy**](https://aws.amazon.com/codedeploy/) – This solution uses AWS CodeDeploy to automate shifting traffic between two AWS Lambda functions.
29+
- [**Amazon API Gateway**](https://aws.amazon.com/api-gateway/) – This solutions creates an HTTPS REST API endpoint for AWS Lambda functions that invoke deployed Amazon SageMaker Endpoint.
2730

2831
## Deployment Steps
2932

30-
Following is the list of steps required to get up and running with this sample.
33+
The following is the list of steps required to get up and running with this sample.
3134

3235
### Prepare an AWS Account
3336

3437
Create your AWS account at [http://aws.amazon.com](http://aws.amazon.com) by following the instructions on the site.
3538

36-
### Optionally Fork this GitHub Repository and create an Access Token
39+
### *Optionally* fork this GitHub Repository and create an Access Token
3740

3841
1. [Fork](https://github.com/aws-samples/sagemaker-safe-deployment-pipeline/fork) a copy of this repository into your own GitHub account by clicking the **Fork** in the upper right-hand corner.
3942
2. Follow the steps in the [GitHub documentation](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line) to create a new (OAuth 2) token with the following scopes (permissions): `admin:repo_hook` and `repo`. If you already have a token with these permissions, you can use that. You can find a list of all your personal access tokens in [https://github.com/settings/tokens](https://github.com/settings/tokens).
@@ -43,18 +46,20 @@ Create your AWS account at [http://aws.amazon.com](http://aws.amazon.com) by fol
4346

4447
Click on the **Launch Stack** button below to launch the CloudFormation Stack to set up the SageMaker safe deployment pipeline.
4548

46-
[![Launch CFN stack](https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateUrl=https%3A%2F%2Famazon-sagemaker-safe-deployment-pipeline.s3.amazonaws.com%2Fpipeline.yml&stackName=nyctaxi&param_GitHubBranch=master&param_GitHubRepo=amazon-sagemaker-safe-deployment-pipeline&param_GitHubUser=aws-samples&param_ModelName=nyctaxi&param_NotebookInstanceType=ml.t3.medium)
49+
[![Launch CFN stack](https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateUrl=https%3A%2F%2Famazon-sagemaker-safe-deployment-pipeline.s3.amazonaws.com%2Fsfn%2Fpipeline.yml&stackName=nyctaxi&param_GitHubBranch=master&param_GitHubRepo=amazon-sagemaker-safe-deployment-pipeline&param_GitHubUser=aws-samples&param_ModelName=nyctaxi&param_NotebookInstanceType=ml.t3.medium)
4750

48-
Provide a stack name eg **sagemaker-safe-deployment-pipeline** and specify the parameters
51+
Provide a stack name eg **sagemaker-safe-deployment-pipeline** and specify the parameters.
4952

5053
Parameters | Description
5154
----------- | -----------
52-
Model Name | A unique name for this model (must less then 15 characters long).
53-
Notebook Instance Type | The [Amazon SageMaker instance type](https://aws.amazon.com/sagemaker/pricing/instance-types/). Default is ml.t3.medium
55+
Model Name | A unique name for this model (must be less than 15 characters long).
56+
S3 Bucket for Dataset | The bucket containing the dataset (defaults to [nyc-tlc](https://registry.opendata.aws/nyc-tlc-trip-records-pds/))
57+
Notebook Instance Type | The [Amazon SageMaker instance type](https://aws.amazon.com/sagemaker/pricing/instance-types/). Default is ml.t3.medium.
5458
GitHub Repository | The name (not URL) of the GitHub repository to pull from.
5559
GitHub Branch | The name (not URL) of the GitHub repository’s branch to use.
56-
GitHub Username | GitHub Username for this repository. Update this if you have Forked the repository.
57-
GitHub Access Token | The Optional Secret OAuthToken with access to your GitHub repo.
60+
GitHub Username | GitHub Username for this repository. Update this if you have forked the repository.
61+
GitHub Access Token | The optional Secret OAuthToken with access to your GitHub repository.
62+
Email Address | The optional Email address to notify on successful or failed deployments.
5863

5964
![code-pipeline](docs/stack-parameters.png)
6065

@@ -72,7 +77,7 @@ You can launch the same stack using the AWS CLI. Here's an example:
7277

7378
### Start, Test and Approve the Deployment
7479

75-
Once the deployment has completed, there will be a new AWS CodePipeline created linked to your GitHub source. You will notice initially that it will be in a *Failed* state as it is waiting on an S3 data source.
80+
Once the deployment is complete, there will be a new AWS CodePipeline created, with a Source stage that is linked to your source code repository. You will notice initially that it will be in a *Failed* state as it is waiting on an S3 data source.
7681

7782
![code-pipeline](docs/data-source-before.png)
7883

@@ -98,14 +103,14 @@ Finally, the SageMaker Notebook provides the ability to retrieve the results fro
98103

99104
### Approximate Times:
100105

101-
Following is a lis of approximate running times fo the pipeline
106+
The following is a list of approximate running times for the pipeline:
102107

103108
* Full Pipeline: 35 minutes
104-
* Start Build: 2 Minutes
105-
* Model Training and Baseline: 5 Minutes
109+
* Start Build: 2 minutes
110+
* Model Training and Baseline: 5 minutes
106111
* Launch Dev Endpoint: 10 minutes
107112
* Launch Prod Endpoint: 15 minutes
108-
* Monitoring Schedule: Runs on the hour
113+
* Monitoring Schedule: runs on the hour
109114

110115
## Customizing for your own model
111116

@@ -118,22 +123,36 @@ This project is written in Python, and design to be customized for your own mode
118123
│   ├── app.py
119124
│   ├── post_traffic_hook.py
120125
│   └── pre_traffic_hook.py
126+
├── assets
127+
│   ├── deploy-model-dev.yml
128+
│   ├── deploy-model-prod.yml
129+
│   ├── suggest-baseline.yml
130+
│   └── training-job.yml
131+
├── custom_resource
132+
| ├── __init__.py
133+
| ├── sagemaker_monitoring_schedule.py
134+
| ├── sagemaker_suggest_baseline.py
135+
| ├── sagemaker_training_job.py
136+
│   └── sagemaker-custom-resource.yml
121137
├── model
122138
│   ├── buildspec.yml
139+
│   ├── dashboard.json
123140
│   ├── requirements.txt
124141
│   └── run.py
125142
├── notebook
143+
│   ├── canary.js
144+
│   ├── dashboard.json
126145
│   └── mlops.ipynb
127146
└── pipeline.yml
128147
```
129148

130149
Edit the `get_training_params` method in the `model/run.py` script that is run as part of the AWS CodeBuild step to add your own estimator or model definition.
131150

132-
Extend the AWS Lambda hooks in `api/pre_traffic_hook.py` and `api/post_traffic_hook.py` to add your own validation or inference against the deployed Amazon SageMaker endpoints. Also you can edit the `api/app.py` lambda to add any enrichment or transformation to the request/response payload.
151+
Extend the AWS Lambda hooks in `api/pre_traffic_hook.py` and `api/post_traffic_hook.py` to add your own validation or inference against the deployed Amazon SageMaker endpoints. You can also edit the `api/app.py` lambda to add any enrichment or transformation to the request/response payload.
133152

134153
## Running Costs
135154

136-
This section outlines cost considerations for running the SageMaker Safe Deployment Pipeline. Completing the pipeline will deploy development and production SageMaker endpoints which will cost less than $10 per day. Further cost breakdowns are below.
155+
This section outlines cost considerations for running the SageMaker Safe Deployment Pipeline. Completing the pipeline will deploy development and production SageMaker endpoints which will cost less than $10 per day. Further cost breakdowns are below.
137156

138157
- **CodeBuild** – Charges per minute used. First 100 minutes each month come at no charge. For information on pricing beyond the first 100 minutes, see [AWS CodeBuild Pricing](https://aws.amazon.com/codebuild/pricing/).
139158
- **CodeCommit** – $1/month if you didn't opt to use your own GitHub repository.
@@ -143,27 +162,29 @@ This section outlines cost considerations for running the SageMaker Safe Deploym
143162
- Canaries cost $0.0012 per run, or $5/month if they run every 10 minutes.
144163
- Dashboards cost $3/month.
145164
- Alarm metrics cost $0.10 per alarm.
146-
- **KMS** – $1/month for the key created.
147-
- **Lambda** - Low cost, $0.20 per 1 million request see [Amazon Lambda Pricing](https://aws.amazon.com/lambda/pricing/)
165+
- **CloudTrail** - Low cost, $0.10 per 100,000 data events to enable [S3 CloudWatch Event](https://docs.aws.amazon.com/codepipeline/latest/userguide/create-cloudtrail-S3-source-console.html). For more information, see [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/)
166+
- **KMS** – $1/month for the [Customer Managed CMK](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk) created.
167+
- **API Gateway** - Low cost, $1.29 for first 300 million requests. For more info see [Amazon API Gateway pricing](https://aws.amazon.com/api-gateway/pricing/)
168+
- **Lambda** - Low cost, $0.20 per 1 million request see [AWS Lambda Pricing](https://aws.amazon.com/lambda/pricing/).
148169
- **SageMaker** – Prices vary based on EC2 instance usage for the Notebook Instances, Model Hosting, Model Training and Model Monitoring; each charged per hour of use. For more information, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).
149170
- The `ml.t3.medium` instance *notebook* costs $0.0582 an hour.
150171
- The `ml.m4.xlarge` instance for the *training* job costs $0.28 an hour.
151172
- The `ml.m5.xlarge` instance for the *monitoring* baseline costs $0.269 an hour.
152173
- The `ml.t2.medium` instance for the dev *hosting* endpoint costs $0.065 an hour.
153174
- The two `ml.m5.large` instances for production *hosting* endpoint costs 2 x $0.134 per hour.
154175
- The `ml.m5.xlarge` instance for the hourly scheduled *monitoring* job costs $0.269 an hour.
155-
- **S3** – Prices Vary, depends on size of model/artifacts stored. For first 50 TB each month, costs only $0.023 per GB stored. For more information, see [Amazon S3 Pricing](https://aws.amazon.com/s3/pricing/).
176+
- **S3** – Prices will vary depending on the size of the model/artifacts stored. The first 50 TB each month will cost only $0.023 per GB stored. For more information, see [Amazon S3 Pricing](https://aws.amazon.com/s3/pricing/).
156177

157178
## Cleaning Up
158179

159-
First delete the stacks used as part of the pipeline for deployment, training job and suggest baseline. For a model name of **nyctaxi** that would be.
180+
First, delete the stacks used as part of the pipeline for deployment, training job and suggest baseline. For a model name of **nyctaxi** that would be:
160181

161-
* *nyctaxi*-devploy-prd
162-
* *nyctaxi*-devploy-dev
163-
* *nyctaxi*-training-job
164-
* *nyctaxi*-suggest-baseline
182+
* *nyctaxi*-deploy-prd
183+
* *nyctaxi*-deploy-dev
184+
* *nyctaxi*-workflow
185+
* sagemaker-custom-resource
165186

166-
Then delete the stack you created.
187+
Finally, delete the stack you created in AWS CloudFormation.
167188

168189
## Security
169190

assets/deploy-model-dev.yml

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Description: Deploy a model at Sagemaker
1+
Description: Deploy the development Amazon SageMaker Endpoint.
22
Parameters:
33
ImageRepoUri:
44
Type: String
@@ -9,10 +9,10 @@ Parameters:
99
TrainJobId:
1010
Type: String
1111
Description: Id of the Codepipeline + SagemakerJobs
12-
MLOpsRoleArn:
12+
DeployRoleArn:
1313
Type: String
1414
Description: The role for executing the deployment
15-
VariantName:
15+
ModelVariant:
1616
Type: String
1717
Description: Name of the endpoint variant
1818
KmsKeyId:
@@ -27,7 +27,7 @@ Resources:
2727
PrimaryContainer:
2828
Image: !Ref ImageRepoUri
2929
ModelDataUrl: !Sub s3://sagemaker-${AWS::Region}-${AWS::AccountId}/${ModelName}/mlops-${ModelName}-${TrainJobId}/output/model.tar.gz
30-
ExecutionRoleArn: !Ref MLOpsRoleArn
30+
ExecutionRoleArn: !Ref DeployRoleArn
3131

3232
EndpointConfig:
3333
Type: "AWS::SageMaker::EndpointConfig"
@@ -37,18 +37,12 @@ Resources:
3737
InitialVariantWeight: 1.0
3838
InstanceType: ml.t2.medium
3939
ModelName: !GetAtt Model.ModelName
40-
VariantName: !Ref VariantName
40+
VariantName: !Sub ${ModelVariant}-${ModelName}
4141
EndpointConfigName: !Sub mlops-${ModelName}-dec-${TrainJobId}
4242
KmsKeyId: !Ref KmsKeyId
43-
Tags:
44-
- Key: Name
45-
Value: !Sub mlops-${ModelName}-dec-${TrainJobId}
4643

4744
Endpoint:
4845
Type: "AWS::SageMaker::Endpoint"
4946
Properties:
5047
EndpointName: !Sub mlops-${ModelName}-dev-${TrainJobId}
5148
EndpointConfigName: !GetAtt EndpointConfig.EndpointConfigName
52-
Tags:
53-
- Key: Name
54-
Value: !Sub mlops-${ModelName}-dev-${TrainJobId}

0 commit comments

Comments
 (0)