-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions on large model inference / finetuning #3353
Comments
Here is my accelerate environment --
|
#1890 I think this feature request is highly relevant to my question I think |
I guess what you are trying to do is somewhat similar to training the model and at some point during training evaluate it. |
Hi @muellerzr !
I am trying to run Llama 8b model on gpu-a40s using accelerate. I want to first evaluate the model and then add a few trainable parameters and train them. Since the llama 8b checkpoint cannot fit on a single gpu-a40 I am using fsdp configuration. (is it the correct choice?)
when I run accelerate launch the code enters the following method from utils/fsdp_utils.py --
def load_fsdp_model(fsdp_plugin, accelerator, model, input_dir, model_index=0, adapter_only=False):
and then raises the following error:
I went through the documentations -- https://huggingface.co/docs/accelerate/en/usage_guides/distributed_inference
as well as https://huggingface.co/docs/accelerate/en/usage_guides/fsdp -- am I missing something here? any help/documentation/tutorial on how to run/finetune/train large models where GPU memory is not sufficient and uses some sort of model sharding using accelerate would be really helpful!!!
Thanks,
Kalyani
The text was updated successfully, but these errors were encountered: