-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support zero-3 for FLUX training #10743
Comments
lora + deepspeed won't work, unfortunately |
@bghira did it work on megatron? |
the problem is a bug in the interaction between Diffusers, Accelerate, PEFT, and DeepSpeed; which weren't involved for that training run of Megatron :D |
@bghira I see. Sorry for my expression, and my question is whether we can use megatron for Flux training on 8 GPUs with 32GB each, which haven't been mentioned in relation to any issues. |
This bug is caused by the embedding layer of the text encoder being split into different Gpus. If the parameters of the text encoder are aggregated, the error will not be reported. However, doing so will result in only a slight drop in memory relative to zero stage 2. After gathering the encoder parameters, I still couldn't fine-tune all parameters.
|
Thx. I disable zero init with context manager, and not use this function. |
@bghira hi, I successfully ran the model using zero3 which only disable zero init on encoder models, and here is my modifications:
And there an acceptable margin of error between zero2 and zero3. However, I noticed an issue: this model doesn't split the model parameters before loading them onto GPUs. Instead, it loads the entire model during the I discovered that this issue arises because
Shoule we consider enhancing the implementation of the diffusers class? |
Describe the bug
Due to memory limitations, I am attempting to use Zero-3 for Flux training on 8 GPUs with 32GB each. I encountered a bug similar to the one reported in this issue: #1865. I made modifications based on the solution proposed in this pull request: #3076. However, the same error persists. In my opinion, the fix does not work as expected, at least not entirely. Could you advise on how to modify it further?
The relevant code from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_flux.py#L1157 has been updated as follows:
Reproduction
deepspeed config:
accelerate config:
training shell:
Logs
RuntimeError: 'weight' must be 2-D
System Info
pytorch: 2.1.0
deepspeed: 0.14.0
accelerate: 1.3.0
diffusers: develop
Who can help?
No response
The text was updated successfully, but these errors were encountered: