instruct_pix2pix problem #7678

mechigonft · 2024-04-15T03:21:16Z

mechigonft
Apr 15, 2024

I'm using the instruct_pix2pix training method to regenerate backgrounds for cut-out food images. However, I've noticed that the generated backgrounds often contain numerous fragmented and distorted cups, plates, and bowls. What could be the reason for this? I've examined my training data, and although it also includes cups, plates, and bowls, there is only one of each, and all are in their normal shape. Could you help me look into this issue?
cut-out food image:

after regenerating the background:

my training data example:
input_image:

edited_image:

training script:

export MODEL_NAME="/models/stable-diffusion-v1-5"
export DATASET_ID=""
export OUTPUT_DIR=""

accelerate launch --mixed_precision="fp16" /ossfs/workspace/diffusers/examples/instruct_pix2pix/train_instruct_pix2pix.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_ID
--enable_xformers_memory_efficient_attention
--resolution=256 --random_flip
--train_batch_size=1 --gradient_accumulation_steps=1 --gradient_checkpointing
--max_train_steps=5000
--checkpointing_steps=10000 --checkpoints_total_limit=1
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0
--conditioning_dropout_prob=0.05
--mixed_precision=fp16
--seed=42
--output_dir=$OUTPUT_DIR

inference script:

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = '' # <- replace this
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
generator = torch.Generator("cuda").manual_seed(0)

image_path = '/ossfs/workspace/result.png'
def download_image(image_path):
image = PIL.Image.open(image_path)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image

image = download_image(image_path)

prompt = 'replace the background with a clean and concise background, simple and clean'

prompt = 'replace the background picture to pure white background'

prompt = 'extend background'
num_inference_steps = 20
image_guidance_scale = 1.5
guidance_scale = 10

edited_image = pipe(prompt,

ng_prompt = 'other food and drinks, white empty cups, white empty bowls, white empty plates, cutlery, knives and forks, chopsticks, complex background',

ng_prompt = 'cups, bowls, plates',
image=image,
num_inference_steps=num_inference_steps,
image_guidance_scale=image_guidance_scale,
guidance_scale=guidance_scale,
generator=generator,
).images[0]
edited_image.save("/ossfs/workspace/result_extend_background.png")

System Info
$diffusers-cli env
Setting ds_accelerator to cuda (auto detect)

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

diffusers version: 0.28.0.dev0
Platform: Linux-5.10.134-13.al8.x86_64-x86_64-with-glibc2.17
Python version: 3.8.16
PyTorch version (GPU?): 2.0.0+cu117 (True)
Huggingface_hub version: 0.22.2
Transformers version: 4.33.2
Accelerate version: 0.21.0
xFormers version: 0.0.21
Using GPU in script?:
Using distributed or parallel set-up in script?:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instruct_pix2pix problem #7678

{{title}}

Replies: 0 comments

Select a reply

instruct_pix2pix problem #7678

mechigonft Apr 15, 2024

prompt = 'replace the background with a clean and concise background, simple and clean'

prompt = 'replace the background picture to pure white background'

ng_prompt = 'other food and drinks, white empty cups, white empty bowls, white empty plates, cutlery, knives and forks, chopsticks, complex background',

Replies: 0 comments

mechigonft
Apr 15, 2024