You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not able to deploy the Phi3 model from huggingface model hub to sagemaker.
I tried using multiple DLC containers, with and without trust_remote_code: true . Still not able to get it run.
I receive the following error:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 258, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 222, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 420, in get_model
return FlashLlama(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
model = FlashLlamaForCausalLM(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 368, in __init__
self.model = FlashLlamaModel(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 292, in __init__
[
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 293, in <listcomp>
FlashLlamaLayer(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 232, in __init__
self.self_attn = FlashLlamaAttention(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 108, in __init__
self.query_key_value = load_attention(config, prefix, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 43, in load_attention
bias = config.attention_bias
File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 263, in __getattribute__
return super().__getattribute__(key)
AttributeError: 'Phi3Config' object has no attribute 'attention_bias' #033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m
#033[2m2024-05-21T16:19:40.764815Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start
#033[2m2024-05-21T16:19:40.764834Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
Error: ShardCannotStart
fromsagemakerimportget_execution_role, Sessionimportboto3sagemaker_session=Session()
region=boto3.Session().region_name# get execution role# please use execution role if you are using notebook instance or update the role arn if you are using a different roleexecution_role=get_execution_role()
image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.3-gpu-py310-cu121-ubuntu22.04-v2.0'fromsagemaker.huggingfaceimportHuggingFaceModelhub= {
'HF_TASK': 'text-generation',
'HF_MODEL_ID':'microsoft/Phi-3-mini-128k-instruct',
'TRUST_REMOTE_CODE': 'true',
'HF_MODEL_TRUST_REMOTE_CODE': 'true'
}
huggingface_model=HuggingFaceModel(
env=hub,
image_uri=image_uri,
role=execution_role,
sagemaker_session=sagemaker_session
)
predictor=huggingface_model.deploy( initial_instance_count=1,instance_type="ml.g5.2xlarge")
The text was updated successfully, but these errors were encountered:
@philschmid Thanks for that PR. It's working fine, when I pointed it to that revision.
However, shouldn't the issue be actually fixed upstream, by initializing config.attention_bias = False ?
I'm not able to deploy the Phi3 model from huggingface model hub to sagemaker.
I tried using multiple DLC containers, with and without
trust_remote_code: true
. Still not able to get it run.I receive the following error:
The text was updated successfully, but these errors were encountered: