Skip to content

Server reruns same task multiple times #133

@kurtgdl

Description

@kurtgdl

I used

deploy = HuggingFaceModel(
  name=model_name,
  role=role,
  code_location="abc",
  model_data=path_to_s3,
  transformers_version="4.37",  
  pytorch_version="2.1",       
  py_version='py310',
  model_server_workers=1,
)
emb = deploy.deploy(
  endpoint_name=model_name,
  initial_instance_count=1,
  instance_type="ml.c5.4xlarge",
  container_startup_health_check_timeout=300,
)

The custom script was

def model_fn(model_dir):
    processor = DataProcess() # A class that contains logic for processing each file.
    return processor

def predict_fn(data, model):
    text = model.process_file(data)
    return {"output": text}

The input data is a base64 string of a file content.
It's strange that when the file is pretty small, under 1MB, the server runs model_fn and predict_fn once, and the process took around 30 seconds. But when I inputted large file of around 1.5MB, it runs model_fn and predict_fn multiple times, each time lasting around 2mins. I know this because the same request gives multiple contents of

 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 5.128383636474609 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 162199.17178153992 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.00762939453125 ms

It's probably unorthodox to use the server for the data processing job. But what configs did I miss?

Related: aws/amazon-sagemaker-examples#1073

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions