Skip to content

Sagemaker HuggingfaceModel fails on phi3 model deployment #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
manikawnth opened this issue May 21, 2024 · 2 comments
Open

Sagemaker HuggingfaceModel fails on phi3 model deployment #123

manikawnth opened this issue May 21, 2024 · 2 comments

Comments

@manikawnth
Copy link

I'm not able to deploy the Phi3 model from huggingface model hub to sagemaker.
I tried using multiple DLC containers, with and without trust_remote_code: true . Still not able to get it run.

I receive the following error:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 258, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 222, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 420, in get_model
    return FlashLlama(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
    model = FlashLlamaForCausalLM(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 368, in __init__
    self.model = FlashLlamaModel(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 292, in __init__
    [
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 293, in <listcomp>
    FlashLlamaLayer(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 232, in __init__
    self.self_attn = FlashLlamaAttention(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 108, in __init__
    self.query_key_value = load_attention(config, prefix, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 43, in load_attention
    bias = config.attention_bias
  File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 263, in __getattribute__
    return super().__getattribute__(key)

AttributeError: 'Phi3Config' object has no attribute 'attention_bias' #033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m


#033[2m2024-05-21T16:19:40.764815Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start
#033[2m2024-05-21T16:19:40.764834Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards

Error: ShardCannotStart

from sagemaker import get_execution_role, Session
import boto3
sagemaker_session = Session()
region = boto3.Session().region_name

# get execution role

# please use execution role if you are using notebook instance or update the role arn if you are using a different role
execution_role = get_execution_role()

image_uri = '763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.3-gpu-py310-cu121-ubuntu22.04-v2.0'

from sagemaker.huggingface import HuggingFaceModel

hub = {
  'HF_TASK': 'text-generation',
  'HF_MODEL_ID':'microsoft/Phi-3-mini-128k-instruct',
  'TRUST_REMOTE_CODE': 'true',
  'HF_MODEL_TRUST_REMOTE_CODE': 'true'
}

huggingface_model = HuggingFaceModel(
    env=hub,
    image_uri=image_uri,
    role=execution_role,
    sagemaker_session=sagemaker_session
)

predictor = huggingface_model.deploy( initial_instance_count=1,instance_type="ml.g5.2xlarge")
@philschmid
Copy link
Collaborator

@manikawnth
Copy link
Author

manikawnth commented May 24, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants