Skip to content

Make DEFAULT_HF_HUB_MODEL_EXPORT_DIRECTORY configurable #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pcolazurdo opened this issue Mar 20, 2024 · 0 comments
Open

Make DEFAULT_HF_HUB_MODEL_EXPORT_DIRECTORY configurable #115

pcolazurdo opened this issue Mar 20, 2024 · 0 comments

Comments

@pcolazurdo
Copy link

In DEFAULT_HF_HUB_MODEL_EXPORT_DIRECTORY = os.path.join(os.getcwd(), ".sagemaker/mms/models") the directory is forced to be in the same path as the current directory of the running process. In some SageMaker instances this is a relatively small partition that can't be extended. Allowing this var to be modified by an environment variable will allow the download of larger models in a variety of instances (i.e. ml.g5.16xlarge)

To reproduce the problem you can try this particular model (other large models will fail the same):

hub = {
	'HF_MODEL_ID':'Salesforce/instructblip-flan-t5-xxl',
	'HF_TASK':'image-to-text',
    'SM_NUM_GPUS': '1',
    'HF_HOME':'/tmp/hf_home',
    'HF_ASSETS_CACHE': '/tmp/hf_assets_cache',
    'HF_DATASETS_CACHE':'/tmp/hf_cache',
    'HF_DATASETS_HOME':'/tmp/hf_home',
    'HF_HUB_CACHE': '/tmp/hf_hub_cache'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.37.0',
	pytorch_version='2.1.0',
	py_version='py310',
	env=hub,
	role=role,
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g5.16xlarge', # ec2 instance type
    # volume_size=256
)

The error in CloudWatch is similar to:

OSError: [Errno 28] No space left on device: '/tmp/hf_hub_cache/tmpd1hcphh0' -> '/.sagemaker/mms/models/Salesforce__instructblip-flan-t5-xxl/pytorch_model-00001-of-00005.bin'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant