You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to train SD3.5-large DreamBooth using the script from: train_dreambooth_sd3.py [by DreamBooth]
We are using an Azure server with an A100 GPU (80GB VRAM).
⚠️ We are running out of memory on step 0
❕It does work without '--train_text_encoder'. It seems that there might be a memory leak or issue with training the text encoder with the current script / model.
❓Does it make sense that the model uses over 80GB of VRAM?
❓Do you have any recommendations on decreasing VRAM usage
Other than:
. 8bit Adam
. Mixed precision 16fp
. xformers (that doesn't work with SD3.5)
SD3.5-Medium
Works on the same machine with the same parameters using 26GB. (80GB with text encoder!)
🔨 What we tried:
Running on lower resolution (up to 10x10).
Increasing gradient accumulation steps.
Debug the python file without Accelerate which resulted in crashing at the "Optimizer.step()" line.
Removing the T5 (largest Tokenizer ~10GB) manually from the script altogether.
2024-12-02 12:36:35.615846: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1733142995.629356 226993 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1733142995.633681 226993 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
12/02/2024 12:36:39 - INFO - main - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'base_shift', 'max_image_seq_len', 'max_shift', 'base_image_seq_len', 'invert_sigmas', 'use_dynamic_shifting'} was not found in config. Values will be initialized to default values.
Downloading shards: 100%|███████████████████████| 2/2 [00:00<00:00, 3450.68it/s]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:03<00:00, 1.73s/it]
Fetching 2 files: 100%|█████████████████████████| 2/2 [00:00<00:00, 7476.48it/s]
{'dual_attention_layers'} was not found in config. Values will be initialized to default values.
12/02/2024 12:37:04 - INFO - main - ***** Running training *****
12/02/2024 12:37:04 - INFO - main - Num examples = 1
12/02/2024 12:37:04 - INFO - main - Num batches each epoch = 1
12/02/2024 12:37:04 - INFO - main - Num Epochs = 800
12/02/2024 12:37:04 - INFO - main - Instantaneous batch size per device = 1
12/02/2024 12:37:04 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 2
12/02/2024 12:37:04 - INFO - main - Gradient Accumulation steps = 2
12/02/2024 12:37:04 - INFO - main - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/azureuser/Picturethis/Dima/train_dreambooth_sd3.py", line 1811, in
main(args)
File "/home/azureuser/Picturethis/Dima/train_dreambooth_sd3.py", line 1666, in main
optimizer.step()
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/optimizer.py", line 171, in step
self.optimizer.step(closure)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
return func.get(opt, opt.class)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/optim/optimizer.py", line 487, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/bitsandbytes/optim/optimizer.py", line 288, in step
self.init_state(group, p, gindex, pindex)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/bitsandbytes/optim/optimizer.py", line 474, in init_state
state["state2"] = self.get_state_buffer(p, dtype=torch.uint8)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/bitsandbytes/optim/optimizer.py", line 328, in get_state_buffer
return torch.zeros_like(p, dtype=dtype, device=p.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 79.15 GiB of which 10.62 MiB is free. Process 68964 has 530.00 MiB memory in use. Including non-PyTorch memory, this process has 78.45 GiB memory in use. Of the allocated memory 75.60 GiB is allocated by PyTorch, and 2.35 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Steps: 0%| | 0/800 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/azureuser/mambaforge/envs/picturevenv/bin/accelerate", line 8, in
sys.exit(main())
^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/azureuser/mambaforge/envs/picturevenv/bin/python3.11', 'train_dreambooth_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3.5-large', '--output_dir=sd_outputs', '--instance_data_dir=ogo', '--instance_prompt=the face of ogo person', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--checkpointing_steps=200', '--learning_rate=2e-6', '--text_encoder_lr=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=800', '--seed=0', '--use_8bit_adam']' returned non-zero exit status 1.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We are trying to train SD3.5-large DreamBooth using the script from: train_dreambooth_sd3.py [by DreamBooth]
We are using an Azure server with an A100 GPU (80GB VRAM).
❕It does work without '--train_text_encoder'. It seems that there might be a memory leak or issue with training the text encoder with the current script / model.
❓Does it make sense that the model uses over 80GB of VRAM?
❓Do you have any recommendations on decreasing VRAM usage
Other than:
. 8bit Adam
. Mixed precision 16fp
. xformers (that doesn't work with SD3.5)
SD3.5-Medium
Works on the same machine with the same parameters using 26GB. (80GB with text encoder!)
🔨 What we tried:
🧪 These are our parameters:
!accelerate launch train_dreambooth_sd3.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large"
--output_dir="sd_outputs"
--instance_data_dir="ogo"
--instance_prompt="the face of ogo person"
--resolution=512
--train_batch_size=1
--gradient_accumulation_steps=2
--gradient_checkpointing
--checkpointing_steps=200
--learning_rate=2e-6
--text_encoder_lr=1e-6
--train_text_encoder
--lr_scheduler="constant"
--lr_warmup_steps=0
--max_train_steps=800
--seed="0"
--use_8bit_adam
--mixed_precision="fp16"
👨🏻💻 Stacktrace
2024-12-02 12:36:35.615846: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1733142995.629356 226993 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1733142995.633681 226993 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
12/02/2024 12:36:39 - INFO - main - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no
You set
add_prefix_space
. The tokenizer needs to be converted from the slow tokenizersYou are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'base_shift', 'max_image_seq_len', 'max_shift', 'base_image_seq_len', 'invert_sigmas', 'use_dynamic_shifting'} was not found in config. Values will be initialized to default values.
Downloading shards: 100%|███████████████████████| 2/2 [00:00<00:00, 3450.68it/s]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:03<00:00, 1.73s/it]
Fetching 2 files: 100%|█████████████████████████| 2/2 [00:00<00:00, 7476.48it/s]
{'dual_attention_layers'} was not found in config. Values will be initialized to default values.
12/02/2024 12:37:04 - INFO - main - ***** Running training *****
12/02/2024 12:37:04 - INFO - main - Num examples = 1
12/02/2024 12:37:04 - INFO - main - Num batches each epoch = 1
12/02/2024 12:37:04 - INFO - main - Num Epochs = 800
12/02/2024 12:37:04 - INFO - main - Instantaneous batch size per device = 1
12/02/2024 12:37:04 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 2
12/02/2024 12:37:04 - INFO - main - Gradient Accumulation steps = 2
12/02/2024 12:37:04 - INFO - main - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/azureuser/Picturethis/Dima/train_dreambooth_sd3.py", line 1811, in
main(args)
File "/home/azureuser/Picturethis/Dima/train_dreambooth_sd3.py", line 1666, in main
optimizer.step()
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/optimizer.py", line 171, in step
self.optimizer.step(closure)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
return func.get(opt, opt.class)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/optim/optimizer.py", line 487, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/bitsandbytes/optim/optimizer.py", line 288, in step
self.init_state(group, p, gindex, pindex)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/bitsandbytes/optim/optimizer.py", line 474, in init_state
state["state2"] = self.get_state_buffer(p, dtype=torch.uint8)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/bitsandbytes/optim/optimizer.py", line 328, in get_state_buffer
return torch.zeros_like(p, dtype=dtype, device=p.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 79.15 GiB of which 10.62 MiB is free. Process 68964 has 530.00 MiB memory in use. Including non-PyTorch memory, this process has 78.45 GiB memory in use. Of the allocated memory 75.60 GiB is allocated by PyTorch, and 2.35 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Steps: 0%| | 0/800 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/azureuser/mambaforge/envs/picturevenv/bin/accelerate", line 8, in
sys.exit(main())
^^^^^^
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/home/azureuser/mambaforge/envs/picturevenv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/azureuser/mambaforge/envs/picturevenv/bin/python3.11', 'train_dreambooth_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3.5-large', '--output_dir=sd_outputs', '--instance_data_dir=ogo', '--instance_prompt=the face of ogo person', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--checkpointing_steps=200', '--learning_rate=2e-6', '--text_encoder_lr=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=800', '--seed=0', '--use_8bit_adam']' returned non-zero exit status 1.
Beta Was this translation helpful? Give feedback.
All reactions