Server reruns same task multiple times

 I used 
```
deploy = HuggingFaceModel(
  name=model_name,
  role=role,
  code_location="abc",
  model_data=path_to_s3,
  transformers_version="4.37",  
  pytorch_version="2.1",       
  py_version='py310',
  model_server_workers=1,
)
emb = deploy.deploy(
  endpoint_name=model_name,
  initial_instance_count=1,
  instance_type="ml.c5.4xlarge",
  container_startup_health_check_timeout=300,
)
```
The custom script was 
```
def model_fn(model_dir):
    processor = DataProcess() # A class that contains logic for processing each file.
    return processor

def predict_fn(data, model):
    text = model.process_file(data)
    return {"output": text}
```
The input `data` is a base64 string of a file content.
It's strange that when the file is pretty small, under 1MB, the server runs `model_fn` and `predict_fn` once, and the process took around 30 seconds. But when I inputted large file of around 1.5MB, it runs `model_fn` and `predict_fn` multiple times, each time lasting around 2mins. I know this because the same request gives multiple contents of 
```
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 5.128383636474609 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 162199.17178153992 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.00762939453125 ms
```

It's probably unorthodox to use the server for the data processing job. But what configs did I miss?

Related: https://github.com/aws/amazon-sagemaker-examples/issues/1073

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server reruns same task multiple times #133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server reruns same task multiple times #133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions