generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 61
Open
Description
I used
deploy = HuggingFaceModel(
name=model_name,
role=role,
code_location="abc",
model_data=path_to_s3,
transformers_version="4.37",
pytorch_version="2.1",
py_version='py310',
model_server_workers=1,
)
emb = deploy.deploy(
endpoint_name=model_name,
initial_instance_count=1,
instance_type="ml.c5.4xlarge",
container_startup_health_check_timeout=300,
)
The custom script was
def model_fn(model_dir):
processor = DataProcess() # A class that contains logic for processing each file.
return processor
def predict_fn(data, model):
text = model.process_file(data)
return {"output": text}
The input data is a base64 string of a file content.
It's strange that when the file is pretty small, under 1MB, the server runs model_fn and predict_fn once, and the process took around 30 seconds. But when I inputted large file of around 1.5MB, it runs model_fn and predict_fn multiple times, each time lasting around 2mins. I know this because the same request gives multiple contents of
[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 5.128383636474609 ms
[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 162199.17178153992 ms
[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.00762939453125 ms
It's probably unorthodox to use the server for the data processing job. But what configs did I miss?
Related: aws/amazon-sagemaker-examples#1073
Metadata
Metadata
Assignees
Labels
No labels