Open
Description
Describe the bug
After #10723 the way controlnet union works with the control images is not correct.
The more clear and concise way of showing this is using a depth map with the tile condition, the tile condition should return almost the same image but in this case it return an image as if it was using a depth map.
I noticed that with this issue, the inpainting also suffers a lot and mostly generates subpar inpainted images compared to the original implementation or how it was before that PR.
Reproduction
Using this code before and after the relevant PR.
import torch
from diffusers import ControlNetUnionModel, StableDiffusionXLControlNetUnionPipeline
from diffusers.utils import load_image
torch_dtype = torch.float16
prompt = "A cat"
control_image = load_image(
"https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/resources/cat_depthmap.png"
)
controlnet_model = ControlNetUnionModel.from_pretrained(
"OzzyGT/controlnet-union-promax-sdxl-1.0", variant="fp16", torch_dtype=torch_dtype
)
pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16",
controlnet=controlnet_model,
torch_dtype=torch_dtype,
)
pipe.to("cuda")
image = pipe(
prompt=prompt,
guidance_scale=5.0,
num_inference_steps=20,
control_image=control_image,
controlnet_conditioning_scale=0.5,
control_mode=6,
).images[0]
image.save("controlnet_union_test.png")
Before | After |
---|---|
![]() |
![]() |
Who can help?
I'm opening this issue to the community since I don't have time right now, if no one wants to tackle it, I'll do it when I have time.