Skip to content

Commit 10efce6

Browse files
committed
posts: add custom sagemaker deployment guide
powered by Claude sonnet 3.5 Signed-off-by: blacksuan19 <[email protected]>
1 parent e307a91 commit 10efce6

File tree

2 files changed

+288
-0
lines changed

2 files changed

+288
-0
lines changed
Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
---
2+
layout: post
3+
title: "Mastering Custom SageMaker Deployment: A Comprehensive Guide"
4+
description:
5+
"A deep dive into the intricacies of deploying custom models to Amazon
6+
SageMaker"
7+
image: /assets/images/custom-sagemaker.jpg
8+
project: false
9+
permalink: "/blog/:title/"
10+
tags:
11+
- machine-learning
12+
- aws
13+
- sagemaker
14+
---
15+
16+
## Introduction
17+
18+
Recently, I embarked on what I initially thought would be a straightforward
19+
journey: deploying a custom BERT-style model, trained on Databricks and packaged
20+
by MLflow, to Amazon SageMaker. Little did I know that this endeavor would lead
21+
me down a rabbit hole of CUDA errors, SageMaker-specific arguments, and
22+
scattered documentation. This two-week journey of dissecting SageMaker internals
23+
and scouring the web for information has equipped me with valuable insights that
24+
I'm eager to share. Consider this post your guide if you ever find yourself in a
25+
similar predicament.
26+
27+
## The Scenario
28+
29+
Imagine a BERT-style model trained for token classification (essentially Named
30+
Entity Recognition or NER), enhanced with post-processing steps to refine its
31+
output by eliminating false positives based on a predefined list of terms. This
32+
model was developed on Databricks, leveraging its seamless MLflow integration
33+
for experiment tracking and logging. To maintain consistency, we opted to
34+
package the model as an MLflow bundle with a
35+
[custom class](https://mlflow.org/blog/custom-pyfunc) encapsulating all
36+
necessary post-processing steps, with the intention of deploying it to SageMaker
37+
directly via MLflow.
38+
39+
Our initial approach was to use MLflow's built-in method for creating a
40+
SageMaker-compatible image:
41+
42+
```bash
43+
mlflow sagemaker build-and-push-container
44+
```
45+
46+
This command is designed to generate a SageMaker-compatible image for seamless
47+
deployment. However, we quickly discovered a significant limitation: the
48+
resulting image is CPU-only, unsuitable for our Large Language Model (LLM) needs
49+
(Yes, BERT is an LLM). This realization necessitated a pivot to manually
50+
creating and pushing our own GPU-enabled image to Amazon Elastic Container
51+
Registry (ECR) for SageMaker deployment.
52+
53+
## Diving into the Deep End
54+
55+
### The Dockerfile Dilemma
56+
57+
Creating a Dockerfile is typically straightforward, but SageMaker introduces its
58+
own set of requirements and nuances. Key questions emerged:
59+
60+
- What should be the entrypoint?
61+
- Are there CUDA version restrictions?
62+
- How does SageMaker execute the Docker image?
63+
64+
These uncertainties weren't immediately clear from the available documentation.
65+
66+
### Initial Attempts and Roadblocks
67+
68+
Drawing inspiration from a colleague's SageMaker deployment Dockerfile, which
69+
curiously lacked an entrypoint, I initially opted for `nvidia/cuda` as the base
70+
image and attempted to use MLflow for model serving:
71+
72+
```docker
73+
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
74+
75+
# Install Python and necessary packages
76+
RUN apt update && apt install -y python3 python3-pip
77+
RUN pip install mlflow==2.15.1 transformers torch sagemaker
78+
79+
# Set the entrypoint
80+
ENTRYPOINT ["mlflow", "models", "serve", "-m", "/opt/ml/model", "-h", "0.0.0.0", "-p", "8080"]
81+
```
82+
83+
This approach, however, led to an unexpected error regarding an unknown `serve`
84+
argument. After some investigation and experimentation, including removing the
85+
entrypoint entirely, we encountered more cryptic errors, suggesting deeper
86+
issues with our container configuration.
87+
88+
## Going Fully Custom
89+
90+
After hitting multiple dead ends, I decided to take a step back and rethink our
91+
approach. The next day, I opted to drop the use of MLflow for serving and
92+
instead create a custom API inside the Docker image using FastAPI. We would
93+
still use MLflow to load the model file, as our post-processing logic was
94+
defined there. This approach allowed me to test the image locally, and it worked
95+
fine. However, once deployed to SageMaker, we encountered the same `serve`
96+
error.
97+
98+
This persistent issue prompted a deeper investigation into how SageMaker
99+
actually starts these containers. The AWS documentation, unfortunately, didn't
100+
provide a clear, consolidated answer. Ironically, it was AWS's Q AI that
101+
provided the crucial information, outlining exactly how SageMaker starts
102+
containers and what the requirements are for a container to work properly.
103+
104+
### The SageMaker Container Startup Revelation
105+
106+
The key revelation was that SageMaker passes the `serve` argument when starting
107+
the container:
108+
109+
```bash
110+
docker run image_name --volume /path/to/model:/opt/ml/model serve
111+
```
112+
113+
This insight was a game-changer. It explained why our previous attempts failed
114+
and provided a clear direction for our solution. To handle this, we needed to
115+
make our entrypoint a shell script that could correctly consume and handle
116+
passed arguments.
117+
118+
### Unraveling SageMaker's Requirements
119+
120+
Through further research and experimentation (mostly bugging Q about it), we
121+
pieced together the following requirements for a SageMaker-compatible container:
122+
123+
1. **API Endpoints**: The container must expose two specific endpoints:
124+
125+
- `/invocations` for handling prediction requests
126+
- `/ping` for health checks
127+
128+
2. **Port Binding**: The Docker image must be labeled to accept port binding, as
129+
SageMaker dynamically changes the deployment port.
130+
131+
3. **Port Configuration**: The server inside the container should use the port
132+
specified in the `SAGEMAKER_BIND_TO_PORT` environment variable.
133+
134+
4. **Custom Inference Script**: We need to inform SageMaker that we're using a
135+
custom inference script by specifying its name as an environment variable
136+
when creating the `Model` object.
137+
138+
These requirements led us to create the following files:
139+
140+
### entrypoint.sh
141+
142+
```bash
143+
#!/bin/bash
144+
145+
# Set the default port to existing sagemaker set environment variable or default to 8080
146+
PORT=${SAGEMAKER_BIND_TO_PORT:-8080}
147+
148+
# start the API server
149+
# yes the use of exec is actually necessary here
150+
exec uvicorn inference:app --host 0.0.0.0 --port $PORT --workers 4
151+
```
152+
153+
### inference.py
154+
155+
```python
156+
import mlflow.pyfunc
157+
from fastapi import FastAPI, HTTPException
158+
from pydantic import BaseModel
159+
from typing import List, Dict, Any
160+
161+
app = FastAPI()
162+
163+
# Load the model from the default SageMaker model directory
164+
model = mlflow.pyfunc.load_model("/opt/ml/model")
165+
166+
class PredictionRequest(BaseModel):
167+
text: List[str]
168+
169+
class PredictionResponse(BaseModel):
170+
predictions: List[Dict[str, Any]]
171+
172+
@app.get("/ping")
173+
def ping():
174+
return {"status": "ok"}
175+
176+
@app.post("/invocations", response_model=PredictionResponse)
177+
def predict(request: PredictionRequest):
178+
try:
179+
predictions = model.predict(request.text)
180+
return PredictionResponse(predictions=predictions)
181+
except Exception as e:
182+
raise HTTPException(status_code=500, detail=str(e))
183+
184+
if __name__ == '__main__':
185+
import uvicorn
186+
import os
187+
port = os.environ.get("SAGEMAKER_BIND_TO_PORT", 8080)
188+
uvicorn.run(app, host='0.0.0.0', port=port)
189+
```
190+
191+
### requirements.txt
192+
193+
```
194+
mlflow==2.15.1
195+
cloudpickle==2.2.1
196+
fastapi
197+
uvicorn
198+
pydantic
199+
transformers
200+
```
201+
202+
## Additional Challenges
203+
204+
Even with these requirements in place, we faced two more significant hurdles:
205+
206+
1. **CUDA Version Issues**: We found that the CUDA version of the Docker image
207+
needed to match (at least on major versions) with the instance type's CUDA
208+
version. This was particularly challenging as CUDA versions for instance
209+
types are not documented anywhere as usual.
210+
211+
2. **Python Version Compatibility**: We discovered that the Python version,
212+
along with some dependencies like MLflow and cloudpickle, needed to match the
213+
versions used during model packaging. This led to our decision to use pyenv
214+
in the Dockerfile to ensure we had the correct Python version.
215+
216+
These additional considerations resulted in our final, more complex Dockerfile:
217+
218+
```docker
219+
FROM nvidia/cuda:11.4.3-runtime-ubuntu20.04
220+
221+
# sagemaker labels
222+
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
223+
224+
# install system build dependencies (required by pyenv)
225+
ENV HOME="/root"
226+
ENV DEBIAN_FRONTEND=noninteractive
227+
WORKDIR ${HOME}
228+
RUN apt update && apt install -y \
229+
build-essential \
230+
curl \
231+
git \
232+
libssl-dev \
233+
zlib1g-dev \
234+
libbz2-dev \
235+
libreadline-dev \
236+
libsqlite3-dev \
237+
lzma \
238+
liblzma-dev \
239+
libbz2-dev \
240+
wget \
241+
xz-utils \
242+
tk-dev \
243+
libffi-dev \
244+
python3-dev \
245+
gnupg
246+
247+
# cleanup to reduce image size
248+
RUN apt clean && rm -rf /var/lib/apt/lists/*
249+
250+
# install pyenv
251+
RUN git clone --depth=1 https://github.com/pyenv/pyenv.git .pyenv
252+
ENV PYENV_ROOT="${HOME}/.pyenv"
253+
ENV PATH="${PYENV_ROOT}/shims:${PYENV_ROOT}/bin:${PATH}"
254+
255+
# install correct python version
256+
ENV PYTHON_VERSION=3.11
257+
RUN pyenv install ${PYTHON_VERSION}
258+
RUN pyenv global ${PYTHON_VERSION}
259+
260+
# install compatible pytorch version
261+
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
262+
263+
# Install necessary Python packages
264+
COPY ./requirements.txt .
265+
RUN pip install -r requirements.txt
266+
267+
# copy code
268+
COPY . /app
269+
WORKDIR /app
270+
271+
# Define the entry point for the container
272+
RUN chmod +x entrypoint.sh
273+
ENTRYPOINT ["./entrypoint.sh"]
274+
```
275+
276+
This journey through the intricacies of custom SageMaker deployment has been
277+
both challenging and enlightening. It underscores the complexity of deploying
278+
sophisticated machine learning models in cloud environments and the importance
279+
of understanding the underlying infrastructure. Key takeaways from this
280+
experience include the critical nature of proper documentation (if anyone from
281+
AWS is reading this), and the value of persistence in problem-solving. We've
282+
learned that successful deployment often requires a deep dive into system
283+
internals, a willingness to experiment, and the ability to piece together
284+
information from various sources. This process has not only resulted in a
285+
working solution but has also equipped us with valuable knowledge for future
286+
deployments. Developing the model that was deployed is another story for another
287+
blog post when I'm bored enough to recreate it with non proprietary data
288+
(brainrot classifier?).

assets/images/custom-sagemaker.jpg

57.7 KB
Loading

0 commit comments

Comments
 (0)