train_dreambooth_lora_sdxl.py produces zoomed/cropped images #7327
Replies: 1 comment 1 reply
-
These are just some ideas that comes to mind. I haven't trained with black backgrounds but speaking in tensor math, the black are 0s, so probably is learning that the black background is something that you don't need, I could be wrong about this but maybe you can add a very small amount of brightness so that it thinks there's data. About the zoom and cropping, you can also teach the model those, so maybe add some zoomed and cropped images and caption them with that information, like zoomed 1.5x, 2x, 3x, 4x, without zoom and the same to the crop, this way when you generate you can put those as negatives or positive to have more control over the generation. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am using train_dreambooth_lora_sdxl.py (version from 28.02: 7db935a) to generate mammography images.
accelerate launch train_dreambooth_lora_sdxl.py
--pretrained_model_name_or_path='stabilityai/stable-diffusion-xl-base-1.0'
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix"
--cache_dir='.../Project/cache_dir'
--dataset_name='.../Project/DATASET'
--image_column="image"
--caption_column="text"
--repeats=1
--instance_prompt="In the style of MaGHY"
--validation_prompt="In the style of MaGHY, a MLO mammogram."
--num_validation_images=4
--validation_epochs=1
--output_dir='.../Project/OUTPUT/03_RUN'
--seed=42
--resolution=1024
--train_text_encoder
--train_batch_size=1
--sample_batch_size=1
--max_train_steps=2000
--checkpointing_steps=100
--checkpoints_total_limit=100
--gradient_accumulation_steps=5
--gradient_checkpointing
--learning_rate=2e-04
--text_encoder_lr=5e-6
--lr_scheduler="constant"
--snr_gamma=5.0
--lr_warmup_steps=500
--lr_num_cycles=1
--lr_power=1.0
--dataloader_num_workers=0
--optimizer="AdamW"
--adam_beta1=0.9
--adam_beta2=0.999
--adam_weight_decay=1e-04
--adam_weight_decay_text_encoder=1e-03
--adam_epsilon=1e-08
--max_grad_norm=1.0
--report_to=wandb
--mixed_precision="fp16"
--prior_generation_precision="fp16"
--local_rank=-1
--use_8bit_adam
--rank=4
All images in the training set:
The issue is that the generated images are cropped/zoomed:
![image](https://private-user-images.githubusercontent.com/54419373/312883136-d1276fc5-1f64-471c-91ed-e37ea6d044cc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzQzMzAsIm5iZiI6MTczOTY3NDAzMCwicGF0aCI6Ii81NDQxOTM3My8zMTI4ODMxMzYtZDEyNzZmYzUtMWY2NC00NzFjLTkxZWQtZTM3ZWE2ZDA0NGNjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDAyNDcxMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTM2ZjdmMTM2MDUxZTU2OTRjNGVkZmQ4OGVkNzY4NzA2Y2I3YTQwZTMzNmViMDg4YzE3MGU2ZjFmN2MzNTQ4MDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.B8qYCf71nIJVtnnUkmNOEjN1cEggIjn9JkPnMVvrVlk)
I tried following things:
clip_sample (bool, defaults to True) — Clip the predicted sample for numerical stability.
output = pipeline(prompt=prompt, crops_coords_top_left=(0,0)).images[0]
"In the style of MaGHY, a MLO mammogram with black background on the right side."
These things were consistently resulting in an zoomed/cropped images.
I would be grateful, if you would have ideas, what else I could try.
Beta Was this translation helpful? Give feedback.
All reactions