How to enable Batch inference on AWS deployed Serverless model from Hub? #98

jmparejaz · 2023-09-06T21:04:11Z

I am using the serverless inference from Sagemaker with Huggingface Model from the hub
according this example :
https://github.com/huggingface/notebooks/blob/main/sagemaker/19_serverless_inference/sagemaker-notebook.ipynb

using the
#image uri
image_container=get_huggingface_llm_image_uri("huggingface",version="0.9.3")

I was expecting the resulting pipeline to execute as the Pipeline class from transformers for this task (text generation)
however, the input does not work with list.

Is there any approach to do batch inference on Sagemaker SDK?

philschmid · 2023-09-07T06:16:14Z

Hello @jmparejaz,

The input schema for the LLM container should be the same with {"inputs":"text", "parameters": {}} what issue are you seeing. The only difference here is that the LLM container has additional/different parameter, see here: https://huggingface.co/blog/sagemaker-huggingface-llm#4-run-inference-and-chat-with-our-model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to enable Batch inference on AWS deployed Serverless model from Hub? #98

How to enable Batch inference on AWS deployed Serverless model from Hub? #98

jmparejaz commented Sep 6, 2023

philschmid commented Sep 7, 2023

How to enable Batch inference on AWS deployed Serverless model from Hub? #98

How to enable Batch inference on AWS deployed Serverless model from Hub? #98

Comments

jmparejaz commented Sep 6, 2023

philschmid commented Sep 7, 2023