Replies: 2 comments 18 replies
-
Are you asking from the perspective of someone that doesn't know how to do inpainting or are you a more advanced user that knows how to do good inpaints? As a hobby or test project, you can just use IP Adapters to inpaint the shoes over a person. In a professional level it's a lot longer answer: To get the exact same level of details of the real object it's really hard, that's what all the Here's a quick list but there's a lot more: https://github.com/yisol/IDM-VTON The size of the object also affects, for shoes you could probably get good results with closeups but for a person standing with a full body shot, the results will be very bad, same as faces. Sneakers are specially hard since the are affected by a common problem in diffusion models which is that they have a lot of tiny details that gets generated in a wrong way, same problem as fingers in hands, details in cars or bikes, etc. For my experience, this is a lot of work if you want a professional solution, you'll need to train a LoRA, use IP adapters, controlnet and do a lot of refining after with inpainting to fix each detail to just get one production ready image, at this point I think it's still more time and cost efficient to do it with the traditional method of just hiring the model and taking the photos. |
Beta Was this translation helpful? Give feedback.
-
I'm going to use this photo from unsplash: Photo by Paul Gaudriault on Unsplash For this part I will do it manually since it's faster and easier, if I do a guide about this I'll do this with code. So what we need is to rotate, crop and resize the image (better to work with 1024x1024 images with SDXL). After that we can just use code, so first is to pass the image with the anyline preprocessor and then generate the image with the mistoline controlnet and one IP adapter.
Edit: added compel prompt weighting and changed a bit the params to get more consistent results. Install first import torch
from compel import Compel, ReturnedEmbeddingsType
from controlnet_aux import AnylineDetector
from diffusers import (
AutoencoderKL,
ControlNetModel,
DPMSolverMultistepScheduler,
StableDiffusionXLControlNetPipeline,
)
from diffusers.utils import load_image
source_image = load_image(
"https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/shoe_cropped_nobgpng_rotated_scaled.png?download=true"
)
anyline = AnylineDetector.from_pretrained("TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline").to("cuda")
preprocessed = anyline(source_image, detect_resolution=1024)
controlnet = ControlNetModel.from_pretrained(
"TheMistoAI/MistoLine",
torch_dtype=torch.float16,
variant="fp16",
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
"SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnet, vae=vae
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)
pipeline.load_ip_adapter(
"h94/IP-Adapter",
subfolder="sdxl_models",
weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
image_encoder_folder="models/image_encoder",
)
scale_config = {
"up": {"block_0": [0.0, 0.9, 0.0]},
}
pipeline.set_ip_adapter_scale(scale_config)
compel = Compel(
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
requires_pooled=[False, True],
)
prompt = "hiqh quality photo of a (person leg)1.5 wearing a shoe+ on the sidewalk"
conditioning, pooled = compel(prompt)
generator = torch.Generator(device="cpu").manual_seed(216797721)
image = pipeline(
prompt_embeds=conditioning,
pooled_prompt_embeds=pooled,
negative_prompt="bokeh",
guidance_scale=7.5,
num_inference_steps=25,
generator=generator,
image=preprocessed,
controlnet_conditioning_scale=0.8,
control_guidance_end=0.5,
ip_adapter_image=source_image,
).images[0]
image.save("result.png") |
Beta Was this translation helpful? Give feedback.
-
Hi there
Above, I got this Nike shoe.
My goal is simple. Take a photo of an actor/actress wearing a pair of shoes and in paint this Nike instead of what they are wearing. How can I do this in a way where the inpainted nike's orientation/angle will match that of the shoe inside the picture?
Beta Was this translation helpful? Give feedback.
All reactions