instruct_pix2pix problem #7678
Unanswered
mechigonft
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm using the instruct_pix2pix training method to regenerate backgrounds for cut-out food images. However, I've noticed that the generated backgrounds often contain numerous fragmented and distorted cups, plates, and bowls. What could be the reason for this? I've examined my training data, and although it also includes cups, plates, and bowls, there is only one of each, and all are in their normal shape. Could you help me look into this issue?
![result](https://private-user-images.githubusercontent.com/90537707/322342218-78612310-6b65-400c-8a67-4e5edc05bafb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5NzE0NjIsIm5iZiI6MTczOTk3MTE2MiwicGF0aCI6Ii85MDUzNzcwNy8zMjIzNDIyMTgtNzg2MTIzMTAtNmI2NS00MDBjLThhNjctNGU1ZWRjMDViYWZiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDEzMTkyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg3ZjM0MGFlM2QzMjNmOWE4NTkyOTRmZmM2ZTllOTM4Njk5ZmVlYThkZTg0YWNkZWNkZTJiODE4NmE0NTIyOTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.b3U7KZT-KophiVKBl3H5GC5UC9GPoD8gGAjEjTP-qtw)
cut-out food image:
after regenerating the background:
![result_extend_background](https://private-user-images.githubusercontent.com/90537707/322342235-d67ab877-44ab-427a-bb4b-b6e3b7eb46dc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5NzE0NjIsIm5iZiI6MTczOTk3MTE2MiwicGF0aCI6Ii85MDUzNzcwNy8zMjIzNDIyMzUtZDY3YWI4NzctNDRhYi00MjdhLWJiNGItYjZlM2I3ZWI0NmRjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDEzMTkyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk1OTc4YWFhNDJjYzg4NWU4YTUwZDZlMDNmMzIyMzI0ZTMwNTA1ZWFkMTIyZjJhYjI0MmE5ODgzOTMwZDljYjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.QGVs6tdhy48vmkFirlu0QJZDXL5n529n2u59LGpIYVk)
my training data example:
![1](https://private-user-images.githubusercontent.com/90537707/322342444-44c1b829-73a0-4356-b7f8-d06cb4b1c9dc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5NzE0NjIsIm5iZiI6MTczOTk3MTE2MiwicGF0aCI6Ii85MDUzNzcwNy8zMjIzNDI0NDQtNDRjMWI4MjktNzNhMC00MzU2LWI3ZjgtZDA2Y2I0YjFjOWRjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDEzMTkyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE1Yzk4Nzg2MWY4MzA3NzY3MGVlNWNmNTIwNTI2MzA1NzBjMjg3Y2E4MzA5NjIwYWM0ZmMzNTM1ZmVmZmIwNjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-ewfof8KqtmRUFCG26uI1cmdfKZgTEebijQpuUajfWY)
input_image:
edited_image:
![1](https://private-user-images.githubusercontent.com/90537707/322342511-5c652ed3-984d-4752-9a39-ba34a13e7747.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5NzE0NjIsIm5iZiI6MTczOTk3MTE2MiwicGF0aCI6Ii85MDUzNzcwNy8zMjIzNDI1MTEtNWM2NTJlZDMtOTg0ZC00NzUyLTlhMzktYmEzNGExM2U3NzQ3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDEzMTkyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTA4OTVjNjFmZTMzZjJhYTYwNDI1M2YxYjNjZWMxOWQ1MzRiOTc0YjZlNjNhYjAyY2NiNTAyMjJkZWRhNzc0NjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.t_fm2gRWmCrrJnAAwDYiXqZT7Ho9RPPrNRXq3kLaaVE)
training script:
export MODEL_NAME="/models/stable-diffusion-v1-5"
export DATASET_ID=""
export OUTPUT_DIR=""
accelerate launch --mixed_precision="fp16" /ossfs/workspace/diffusers/examples/instruct_pix2pix/train_instruct_pix2pix.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_ID
--enable_xformers_memory_efficient_attention
--resolution=256 --random_flip
--train_batch_size=1 --gradient_accumulation_steps=1 --gradient_checkpointing
--max_train_steps=5000
--checkpointing_steps=10000 --checkpoints_total_limit=1
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0
--conditioning_dropout_prob=0.05
--mixed_precision=fp16
--seed=42
--output_dir=$OUTPUT_DIR
inference script:
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
model_id = '' # <- replace this
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
generator = torch.Generator("cuda").manual_seed(0)
image_path = '/ossfs/workspace/result.png'
def download_image(image_path):
image = PIL.Image.open(image_path)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(image_path)
prompt = 'replace the background with a clean and concise background, simple and clean'
prompt = 'replace the background picture to pure white background'
prompt = 'extend background'
num_inference_steps = 20
image_guidance_scale = 1.5
guidance_scale = 10
edited_image = pipe(prompt,
ng_prompt = 'other food and drinks, white empty cups, white empty bowls, white empty plates, cutlery, knives and forks, chopsticks, complex background',
ng_prompt = 'cups, bowls, plates',
image=image,
num_inference_steps=num_inference_steps,
image_guidance_scale=image_guidance_scale,
guidance_scale=guidance_scale,
generator=generator,
).images[0]
edited_image.save("/ossfs/workspace/result_extend_background.png")
System Info
$diffusers-cli env
Setting ds_accelerator to cuda (auto detect)
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
diffusers version: 0.28.0.dev0
Platform: Linux-5.10.134-13.al8.x86_64-x86_64-with-glibc2.17
Python version: 3.8.16
PyTorch version (GPU?): 2.0.0+cu117 (True)
Huggingface_hub version: 0.22.2
Transformers version: 4.33.2
Accelerate version: 0.21.0
xFormers version: 0.0.21
Using GPU in script?:
Using distributed or parallel set-up in script?:
Beta Was this translation helpful? Give feedback.
All reactions