You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, all models with sharded checkpoints such as these are failing to deploy, as this library is filtering out files that don't match a predefined allowlist, and the sharded format isn't included in that list.
I've made a PR that fixes this issue in #93 but until it gets merged you might be able to get by by building a custom docker image with my fork, like so:
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04
RUN pip install --no-cache-dir \
git+https://github.com/JimAllanson/sagemaker-huggingface-inference-toolkit@sharded-checkpoint-support
Background
We are attempting to deploy SageMaker Endpoints using the code provided under Deploy - Amazon SageMaker from huggingface.co for these two models:
https://huggingface.co/Salesforce/codegen25-7b-multi
https://huggingface.co/openchat/opencoderplus
Error
Both endpoints consistently fail to deploy. Both fail health checks - error logs available on request as it does not appear I can attach them here.
The text was updated successfully, but these errors were encountered: