How to inpaint an existing object on top of another existing object? #8374

levoz92 · 2024-05-31T21:39:37Z

levoz92
May 31, 2024

Hi there

Above, I got this Nike shoe.

My goal is simple. Take a photo of an actor/actress wearing a pair of shoes and in paint this Nike instead of what they are wearing. How can I do this in a way where the inpainted nike's orientation/angle will match that of the shoe inside the picture?

asomoza · 2024-06-01T00:16:19Z

asomoza
Jun 1, 2024
Maintainer

Are you asking from the perspective of someone that doesn't know how to do inpainting or are you a more advanced user that knows how to do good inpaints?

As a hobby or test project, you can just use IP Adapters to inpaint the shoes over a person.

In a professional level it's a lot longer answer:

To get the exact same level of details of the real object it's really hard, that's what all the virtual try on (VTON) solutions try to achieve. If you want a ready to go solution probably your best bet is to search and test them.

Here's a quick list but there's a lot more:

https://github.com/yisol/IDM-VTON
https://github.com/levihsu/OOTDiffusion
https://github.com/bcmi/DCI-VTON-Virtual-Try-On
https://github.com/xiezhy6/GP-VTON

The size of the object also affects, for shoes you could probably get good results with closeups but for a person standing with a full body shot, the results will be very bad, same as faces.

Sneakers are specially hard since the are affected by a common problem in diffusion models which is that they have a lot of tiny details that gets generated in a wrong way, same problem as fingers in hands, details in cars or bikes, etc.

For my experience, this is a lot of work if you want a professional solution, you'll need to train a LoRA, use IP adapters, controlnet and do a lot of refining after with inpainting to fix each detail to just get one production ready image, at this point I think it's still more time and cost efficient to do it with the traditional method of just hiring the model and taking the photos.

11 replies

levoz92 Jun 1, 2024
Author

Will you post the code here?

asomoza Jun 1, 2024
Maintainer

yes

levoz92 Jun 1, 2024
Author

Thank you very much. Looking forward to it.

levoz92 Jun 1, 2024
Author

Any updates?

asomoza Jun 1, 2024
Maintainer

done, posted as another thread to separate it from this one.

asomoza · 2024-06-01T04:15:59Z

asomoza
Jun 1, 2024
Maintainer

I'm going to use this photo from unsplash:

Photo by Paul Gaudriault on Unsplash

For this part I will do it manually since it's faster and easier, if I do a guide about this I'll do this with code. So what we need is to rotate, crop and resize the image (better to work with 1024x1024 images with SDXL). After that we can just use code, so first is to pass the image with the anyline preprocessor and then generate the image with the mistoline controlnet and one IP adapter.

source	anyline	result

Edit: added compel prompt weighting and changed a bit the params to get more consistent results.

Install first controlnet_aux and compel. Here's the code:

import torch
from compel import Compel, ReturnedEmbeddingsType
from controlnet_aux import AnylineDetector

from diffusers import (
    AutoencoderKL,
    ControlNetModel,
    DPMSolverMultistepScheduler,
    StableDiffusionXLControlNetPipeline,
)
from diffusers.utils import load_image


source_image = load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/shoe_cropped_nobgpng_rotated_scaled.png?download=true"
)
anyline = AnylineDetector.from_pretrained("TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline").to("cuda")
preprocessed = anyline(source_image, detect_resolution=1024)

controlnet = ControlNetModel.from_pretrained(
    "TheMistoAI/MistoLine",
    torch_dtype=torch.float16,
    variant="fp16",
)

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")

pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnet, vae=vae
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)

pipeline.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
    image_encoder_folder="models/image_encoder",
)

scale_config = {
    "up": {"block_0": [0.0, 0.9, 0.0]},
}
pipeline.set_ip_adapter_scale(scale_config)

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[False, True],
)
prompt = "hiqh quality photo of a (person leg)1.5 wearing a shoe+ on the sidewalk"
conditioning, pooled = compel(prompt)

generator = torch.Generator(device="cpu").manual_seed(216797721)

image = pipeline(
    prompt_embeds=conditioning,
    pooled_prompt_embeds=pooled,
    negative_prompt="bokeh",
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=generator,
    image=preprocessed,
    controlnet_conditioning_scale=0.8,
    control_guidance_end=0.5,
    ip_adapter_image=source_image,
).images[0]

image.save("result.png")

7 replies

levoz92 Jun 1, 2024
Author

Since the input is a shoe with the bg removed, isn't it possible to rotate it and feed it to the model to account for different oritentations?

asomoza Jun 1, 2024
Maintainer

yeah exactly, you can do whatever you want with the shoe to help the model or to generate different positions. That's what I did when I was asking in the first post:

levoz92 Jun 2, 2024
Author

Is it possible to create orientations of a shoe based on the angle of another one?
If I create a segmentation model that detects someone wearing shoes, how can we use your above logic to align the desired shoe to the same angle as the detected one?

levoz92 Jun 2, 2024
Author

Also, my implementation ends up changing the design of the shoe. Why could that be?

asomoza Jun 2, 2024
Maintainer

if you want the shoe to be more precise you need to play with the controlnet guidance params:

    controlnet_conditioning_scale=0.8,
    control_guidance_end=0.5,

but If you make it too strong the generations will be bad because you're taking away the freedom of the model to adapt the generation.

Is it possible to create orientations of a shoe based on the angle of another one?

Yes but that is not a diffusers question, that's just normal code. Sadly even if it interest me I don't have the time to also answer implementation questions. I suggest to search for this or even asking a chatbot on how to do it.

If I create a segmentation model that detects someone wearing shoes, how can we use your above logic to align the desired shoe to the same angle as the detected one?

This is an interesting question, the segmentation models gives you an area and not an angle so no right out of the box. I don't really know if there's a model that gives you the angle too, but it should be possible to train one for this or even maybe code something that detects the angle of an object.

Your question is really more complex now and it's more like the first one, if you really want to preserve the full detail of the shoes it will be better to use an inpaint model with a trained LoRA or maybe do an image to image (with low denoise) over the pasted shoes.

Also you can try openpose or dwpose to detect the person and maybe try to guess the position of the shoes or the angle.

But these are all questions that are out of the scope of diffusers and are more full solutions like the virtual try on ones.

How to inpaint an existing object on top of another existing object? #8374

Uh oh!

levoz92 May 31, 2024

Replies: 2 comments · 18 replies

Uh oh!

asomoza Jun 1, 2024 Maintainer

Uh oh!

levoz92 Jun 1, 2024 Author

Uh oh!

asomoza Jun 1, 2024 Maintainer

Uh oh!

levoz92 Jun 1, 2024 Author

Uh oh!

levoz92 Jun 1, 2024 Author

Uh oh!

asomoza Jun 1, 2024 Maintainer

Uh oh!

Uh oh!

asomoza Jun 1, 2024 Maintainer

Uh oh!

levoz92 Jun 1, 2024 Author

Uh oh!

Uh oh!

asomoza Jun 1, 2024 Maintainer

Uh oh!

levoz92 Jun 2, 2024 Author

Uh oh!

levoz92 Jun 2, 2024 Author

Uh oh!

Uh oh!

asomoza Jun 2, 2024 Maintainer

levoz92
May 31, 2024

Replies: 2 comments 18 replies

asomoza
Jun 1, 2024
Maintainer

levoz92 Jun 1, 2024
Author

asomoza Jun 1, 2024
Maintainer

levoz92 Jun 1, 2024
Author

levoz92 Jun 1, 2024
Author

asomoza Jun 1, 2024
Maintainer

asomoza
Jun 1, 2024
Maintainer

levoz92 Jun 1, 2024
Author

asomoza Jun 1, 2024
Maintainer

levoz92 Jun 2, 2024
Author

levoz92 Jun 2, 2024
Author

asomoza Jun 2, 2024
Maintainer