`MultiControlNetUnionModel` on SDXL #10747

guiyrt · 2025-02-07T20:42:43Z

What does this PR do?

New MultiControlNetUnionModel wrapper class to handle multiple ControlNetUnionModels, similarly to MultiControlNetModel. Addressed in #10656 to control start, end and scale of each condition image.

Input

Segmentation	Pose

Inference code

import torch

from diffusers import StableDiffusionXLControlNetUnionPipeline
from diffusers.models import ControlNetUnionModel, AutoencoderKL
from diffusers.utils import load_image

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "brad-twinkl/controlnet-union-sdxl-1.0-promax"

controlnet = ControlNetUnionModel.from_pretrained(
    "brad-twinkl/controlnet-union-sdxl-1.0-promax", torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[controlnet, controlnet],
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
    ),
    torch_dtype=torch.float16,
    variant="fp16",
)

room_seg_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/room_seg.png")
pose_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/pose.png")


pipe.enable_model_cpu_offload()

image = pipe(
    prompt="an astronaut in space",
    width=1024,
    height=1024,
    negative_prompt="lowres, low quality, worst quality",
    generator=torch.manual_seed(42),
    guidance_scale=5,
    num_inference_steps=50,
    control_image=[[pose_img], [room_seg_img]],
    control_mode=[[0], [5]]
).images[0]

image.save("result.jpg")

First, I ran the pipeline as before, using a single ControlNetUnionModel with pose, segmentation and pose+segmentation conditions, to have outputs to compare with.

`ControlNetUnionModel`

Segmentation	Pose	Segmentation + Pose

`MultiControlNetUnionModel`

Two instances of ControlNetUnionModel, one got the segmentation conditioning and the other the pose. To compare the output, I set controlnet_conditioning_scale to [0.0, 1.0] and [0.0, 1.0] to compare with the output of single conditioning using ControlNetUnionModel above. As we see (and expect), these outputs are the same. The output is different when using both segmentation and pose conditioning, which I think is expected.

Segmentation	Pose	Segmentation + Pose

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@hlky @yiyixuxu @vladmandic @asomoza

guiyrt · 2025-02-07T21:53:56Z

Some points I had in my mind:

What's the standing on control_image / image parameter naming? Is this something relevant to change here?
Should I write new tests for MultiControlNetUnionModel?
In the forward() of ControlNetUnionModel, we have the argument control_type_idx which is the multi-hot encoding of the conditions used. Moving this to inside the function would remove some code from the pipelines, as it is derived from control_type. And we wouldn't need to handle it differently for multi or single controlnet_union, as we do now. As we are updating the controlnet_union pipelines, we could easily change this, if you find it relevant :) (does not change pipeline public interface)

# Example, [2,5] -> (0,0,1,0,0,1,0,0)
if isinstance(controlnet, ControlNetUnionModel):
    control_type = torch.zeros(controlnet.config.num_control_type).scatter(0, torch.tensor(control_mode), 1)
elif isinstance(controlnet, MultiControlNetUnionModel):
    control_type = [
        torch.zeros(controlnet_.config.num_control_type).scatter(0, torch.tensor(control_mode_), 1)
        for control_mode_, controlnet_ in zip(control_mode, self.controlnet.nets)
    ]

hlky

control_image / image is the standard naming, we'd like to keep it consistent across pipelines
That would be great
We have that in the pipeline to avoid re-computing it for every sampling step. The original had it in the model and used nonzero which caused a cuda sync every step. We changed it to avoid the sync, using scatter here is nice though, looks like we can use scatter_ to be in-place.

src/diffusers/pipelines/controlnet/pipeline_controlnet_union_sd_xl.py

src/diffusers/models/controlnets/multicontrolnet_union.py

HuggingFaceDocBuilderDev · 2025-02-07T22:35:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: hlky <[email protected]>

guiyrt · 2025-02-08T14:20:07Z

control_image / image is the standard naming, we'd like to keep it consistent across pipelines

👍, was checking because of #10131 (comment)

That would be great

👍, will add new test class

We have that in the pipeline to avoid re-computing it for every sampling step. The original had it in the model and used nonzero which caused a cuda sync every step. We changed it to avoid the sync, using scatter here is nice though, looks like we can use scatter_ to be in-place.

That makes sense, keeping it as is and updating to use _scatter instead

john09282922 · 2025-02-10T03:26:23Z

@guiyrt, Hi, thanks for awesome work, How can I use multi-controlnet-union? is it code? and also can you give me example for each condition scale?

yiyixuxu · 2025-02-10T18:52:55Z

@hlky feel free to merge once it looks good to you!

hlky · 2025-02-10T20:57:51Z

@guiyrt Can you check the output of ControlNetUnionModel with Segmentation + Pose against main?

guiyrt · 2025-02-10T21:08:19Z

@guiyrt Can you check the output of ControlNetUnionModel with Segmentation + Pose against main?

Yep, I'll post it here in a sec.

I'm working on the tests, one of the issues is related to enable_sequential_cpu_offload, we have the same problem we had with the SigLip image encoder we used for SD3 IP-Adapter, not sure if you remember (
#9987 (comment)). ControlNetUnionModel uses nn.MultiheadAttention, which only works with enable_sequential_cpu_offload if you exclude it from offloading. Only then it passes the tests test_sequential_cpu_offload_forward_passand test_sequential_cpu_offload_forward_pass.

guiyrt · 2025-02-10T21:14:36Z

@guiyrt, Hi, thanks for awesome work, How can I use multi-controlnet-union? is it code? and also can you give me example for each condition scale?

Hi @john09282922, example inference code is above in a dropdown of the PR description, but I'll paste here again :). To change condition scale, start and end, you just need to pass them as you would normally, but now in a list. For example, you now pass control_guidance_start as [0.1, 0.0, 0.5], if you have 3 controlnet_unions and 0.1 goes to the first, 0.0 to the second and 0.5 to the third. Same for control_guidance_end and controlnet_conditioning_scale.

Inference code

import torch

from diffusers import StableDiffusionXLControlNetUnionPipeline
from diffusers.models import ControlNetUnionModel, AutoencoderKL
from diffusers.utils import load_image

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "brad-twinkl/controlnet-union-sdxl-1.0-promax"

controlnet = ControlNetUnionModel.from_pretrained(
    "brad-twinkl/controlnet-union-sdxl-1.0-promax", torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[controlnet, controlnet],
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
    ),
    torch_dtype=torch.float16,
    variant="fp16",
)

room_seg_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/room_seg.png")
pose_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/pose.png")


pipe.enable_model_cpu_offload()

image = pipe(
    prompt="an astronaut in space",
    width=1024,
    height=1024,
    negative_prompt="lowres, low quality, worst quality",
    generator=torch.manual_seed(42),
    guidance_scale=5,
    num_inference_steps=50,
    control_image=[[pose_img], [room_seg_img]],
    control_mode=[[0], [5]]
).images[0]

image.save("result.jpg")

guiyrt · 2025-02-10T21:21:49Z

@guiyrt Can you check the output of ControlNetUnionModel with Segmentation + Pose against main?

@hlky Corporate needs you to find the differences between this picture and this picture

Main	`34ab1af`

Inference code

import torch

from diffusers import StableDiffusionXLControlNetUnionPipeline
from diffusers.models import ControlNetUnionModel, AutoencoderKL
from diffusers.utils import load_image

model_id = "stabilityai/stable-diffusion-xl-base-1.0"

controlnet = ControlNetUnionModel.from_pretrained(
    "brad-twinkl/controlnet-union-sdxl-1.0-promax", torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
    ),
    torch_dtype=torch.float16,
    variant="fp16",
)

room_seg_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/room_seg.png")
pose_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/pose.png")


pipe.enable_model_cpu_offload()

image = pipe(
    prompt="an astronaut in a space station",
    width=1024,
    height=1024,
    negative_prompt="lowres, low quality, worst quality",
    generator=torch.manual_seed(42),
    guidance_scale=5,
    num_inference_steps=50,
    control_image=[pose_img, room_seg_img],
    control_mode=[0, 5],
).images[0]

image.save("result_main.jpg")

hlky · 2025-02-10T21:33:55Z

Thanks @guiyrt. Just checking output hasn't changed, I think we should be expecting the output to be the same between ControlNetUnionModel and MultiControlNetUnionModel, but both outputs for Segmentation + Pose look a little off to me.

cc @asomoza we're using control images of different resolutions here, and not resizing to the generation resolution, in your testing of ControlNetUnion does this affect the result? Do you have any recommendations for two control images that are known to work together?

guiyrt · 2025-02-10T23:14:58Z

Thanks @guiyrt. Just checking output hasn't changed, I think we should be expecting the output to be the same between ControlNetUnionModel and MultiControlNetUnionModel, but both outputs for Segmentation + Pose look a little off to me.

It works well for each condition, but to get the best results using both might require trying different controlnet_conditioning_scale, for example. Also, the best results I saw had both conditions complementing each other, which is not the case here.

cc @asomoza we're using control images of different resolutions here, and not resizing to the generation resolution, in your testing of ControlNetUnion does this affect the result?

The control images are passed by the VaeImageProcessor which resizes them, but aspect ratio is not conserved (pose in this case).

Original	Processed

asomoza · 2025-02-11T05:46:00Z

I'm not particularly fond of the auto resizing of the images for controlnet, the good controlnets are really affected by bad resizings or resolutions, so I would like it more if it throws an error instead of auto scaling the images specially if it messes up the aspect ratio, but that's another issue and not something for this PR.

@hlky, using a real scenario and with real images with the correct sizes and also with the correct resolution for the conditioning images (they have resolutions with the preprocessors too), one common scenario is to use a depth map with another one of the edges or lines ones, I like to use teed with a special combination I learned from anyline.

This combination makes it really easy to test because if you want a good image you'll need to lower the scales and the guidance ends, so you can tell when they're not working good together.

This is my test with it:

depth	teed	both

In this case, the depth map is the one that gives the overall scene composition and lighting and the teed one is the one that adds the details, specially for the river waves and the background.

How does it compare to a single controlnet in main? I can test it with specific conditioning scales and ends for each one, so testing with both conditions at 1.0:

single controlnet multi conditions	multi controlnet

They're different but in theory they should produce the same result. To test this, I tested each condition separately with the single controlnet union:

single controlnet depth	single controlnet teed

There's definitely something going on here, but if you ask me, I think and like more the result in the multicontrolnet from this PR than in the original single one. Also I mostly use it with multricontrolnets because I like to control each one with a different guidance scale and end.

To me it seems that the single controlnet with multiple conditions is not applying that much the depth condition and takes more into account the teed one.

hlky

Thanks @guiyrt

guiyrt · 2025-02-12T11:34:49Z

I'm not particularly fond of the auto resizing of the images for controlnet, the good controlnets are really affected by bad resizings or resolutions, so I would like it more if it throws an error instead of auto scaling the images specially if it messes up the aspect ratio, but that's another issue and not something for this PR.

I agree on this, is there something in place for the other controlnet pipelines regarding preserving aspect ratio? Otherwise, if you find value on this, I could work on something.

They're different but in theory they should produce the same result. To test this, I tested each condition separately with the single controlnet union:

I don't think an equal result is expected from "single controlnet multi conditions" and "multi controlnet". From the standpoint of controlnet inputs, the control embedding is different, and the fused control condition is also different, right? The output should be similar, but not equal.

asomoza · 2025-02-12T17:59:00Z

I don't think an equal result is expected from "single controlnet multi conditions" and "multi controlnet". From the standpoint of controlnet inputs, the control embedding is different, and the fused control condition is also different, right? The output should be similar, but not equal.

yeah, probably word it bad, that's what I meant, they're different when comparing them, and as I wrote before, I don't use the multiple condition in the same controlnet union except for testing. I'll do some more testings later.

Also I wouldn't make this something that important or a blocker, controlnet union it's really weird and cool at the same time, you can even mix the conditions in one image and it will still work, also you can use conditions with other control types to get some interesting results and most of the time they will still work.

vladmandic · 2025-02-12T21:52:28Z

    control_image=[[pose_img], [room_seg_img]],
    control_mode=[[0], [5]]

why are those now list-of-lists?
this is not how any other multicontrolnets are as control_image is list-of-images?

asomoza · 2025-02-12T22:21:39Z

isn't it because this is MultiControlNetUnion where each controlnet (union) accepts a list of images?

If I understand correctly what you're saying, you're suggesting that we make the controlnetunion a single image controlnet when used with multi controlnets?

vladmandic · 2025-02-12T22:27:31Z

yes, exactly.

its either single controlnet union with multi inputs (and then we deal with no independent scale/start/end) or
its multi controlnet union, each with single input

doing multi-of-multi is not something that can be effectively used - and makes assembling correct params a complete nightmare and non-standard with any other controlnet.

elismasilva · 2025-02-12T23:48:23Z

@guiyrt If you apply a control mask to control pose you can improve segment control net, see this:

Without mask	With mask

I am using conditioning scale, see:

I changed prompt to: an astronaut in space, inside spaceship and steps only 30 steps.
If quality image is bad is ok iam using float8 inference.

PS: Iam not using your pipeline because I had already implemented the use of controlnet union in my pipeline before the official publication, but I believe the result will be the same in your case, what should help is the control mask.

guiyrt · 2025-02-13T00:58:06Z

its either single controlnet union with multi inputs (and then we deal with no independent scale/start/end) or

its multi controlnet union, each with single input

I looked into the Flux implementation for context, and it is indeed implemented differently. From what I got, there is no FluxControlNetUnionModel, there is only FluxControlNetModel where its forward() has the argument controlnet_mode as None for normal controlnets, and with a single value for controlnet unions. This means you cannot process multi-condition input on a single controlnet union execution. When you have multiple conditions, you instead use FluxMultiControlNetModel, but that calls your controlnet union for each condition input separately. In this case, you have reduced memory usage, as a single controlnet is loaded, but don't benefit from reduced execution time, compared to running two single-purpose controlnets.

doing multi-of-multi is not something that can be effectively used - and makes assembling correct params a complete nightmare and non-standard with any other controlnet.

I agree with you that in the current state, the list-of-lists input is confusing if you have many conditions or many controlnets, but I also don't think we should disregard the usage of multi-condition single execution if a controlnet union supports it. After giving some thought, maybe the ideal scenario would be the following:

ControNetModel: regular controlnets and controlnet unions in single-condition mode, like for Flux. Input is single condition image and single condition mode (can be None for normal contronets).
ControlNetUnionModel: exclusively for controlnet unions that operate in multi-condition mode, like in current version. Input is list of condition images and list of condition modes. Executes controlnet union once, independently of number of condition images.
MultiControlNetModel: multiple ControlNetModel, like in Flux. Input is list of condition images and list of condition modes. Executes controlnets multiple times, as many as condition images. Nets can be (like for Flux):
- a single ControNetModel union operating in single-condition mode. This way, you can control condition scale/start/end individually.
- a mix of normal ControNetModel and ControNetModel union in single-condition mode.
- CANNOT BE a ControlNetUnionModel.

Comparing to current version, this would have the benefits of simplified interface for MultiControlNetModel as in FluxMultiControlNetModel, and you could still benefit of multi-condition single execution of controlnet unions via ControlNetUnionModel. The drawback is that you cannot use ControlNetUnionModel in MultiControlNetModel. If you want to use normal controlnets and controlnet unions, it can still be done but the controlnet unions in single-condition mode.

The more complex interface comes from MultiControlNetUnionModel, where you can pass a list of conditions for each of your controlnets. In this proposal that would be removed, and with that you then can't run multiple controlnet unions in multi-condition mode. Not sure how common of a use case it is, but if you want this functionality, you always need to pass a list of lists for condition images and modes.

And I also think there we wouldn't need separate controlnet_union pipelines, such as StableDiffusionXLControlNetUnionPipeline, as the interface would be the same (same as Flux currently).

john09282922 · 2025-02-13T01:24:46Z

Hi, can you also merge with sdxl inpaint model? like pipeline_controlnet_union_inpaint_sd_xl.py.
@yiyixuxu @hlky

Thanks,

hlky · 2025-02-13T05:09:48Z

@vladmandic You requested this use case, @guiyrt has very kindly taken up MultiControlNetUnion and I've done experiments to use scale per condition without MultiControlNetUnion, #10723. Yes, it is a different interface, it is a unique model, all we need to do to handle this in any integration is an simple if else.

vladmandic · 2025-02-13T12:26:25Z

@hlky @guiyrt i'll make it work and i definitely do appreciate the work, i'm just worried about lack of standardization which significantly increases complexities for end user.
this could be avoided by non-structural changes and instead of throwing runtime error, it could be as simple as

   if isinstance(control_mode, list) and isinstance(control_mode[0], float)
      control_mode=[[x] for x in control_mode]
      control_image=[[x] for x in control_image]

guiyrt · 2025-02-13T15:15:20Z

@hlky @guiyrt i'll make it work and i definitely do appreciate the work, i'm just worried about lack of standardization which significantly increases complexities for end user. this could be avoided by non-structural changes and instead of throwing runtime error, it could be as simple as
   if isinstance(control_mode, list) and isinstance(control_mode[0], float)
      control_mode=[[x] for x in control_mode]
      control_image=[[x] for x in control_image]

If you have a single controlnet union, you can still pass a list of control images and list of control modes. But if you have more than one controlnet, there is no way around list of lists, otherwise we couldn't infer which input would go for which controlnet (assuming you want to pass multiple condition to a single controlnet union).

I'll put here an example using three conditions, and the differences in using one and two controlnets.

Single controlnet union

StableDiffusionXLControlNetUnionPipeline is instantiated with a single controlnet. All condition go to the single controlnet, control_image is a list of images and control_mode is a list of int.

import torch

from diffusers import StableDiffusionXLControlNetUnionPipeline
from diffusers.models import ControlNetUnionModel, AutoencoderKL
from diffusers.utils import load_image

controlnet = ControlNetUnionModel.from_pretrained(
    "brad-twinkl/controlnet-union-sdxl-1.0-promax", torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
    ),
    torch_dtype=torch.float16,
    variant="fp16",
)

seg_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/room_seg.png")
seg_mode = 5

pose_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/pose.png")
pose_mode = 0

depth_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/stormtrooper_depth.png")
depth_mode = 1

pipe.enable_model_cpu_offload()

image = pipe(
    prompt="an astronaut in space",
    width=1024,
    height=1024,
    negative_prompt="lowres, low quality, worst quality",
    generator=torch.manual_seed(42),
    guidance_scale=5,
    num_inference_steps=50,
    control_image=[pose_img, seg_img, depth_img],
    control_mode=[pose_mode, seg_mode, depth_mode]
).images[0]

image.save("result.jpg")

Multiple controlnet unions

Now, StableDiffusionXLControlNetUnionPipeline is instantiated with a list with two controlnets. Pose and segmentation conditions go to the first single controlnet and depth goes to the second controlnet. control_image is a list of lists of images and control_mode is a list of lists of int.

import torch

from diffusers import StableDiffusionXLControlNetUnionPipeline
from diffusers.models import ControlNetUnionModel, AutoencoderKL
from diffusers.utils import load_image

controlnet = ControlNetUnionModel.from_pretrained(
    "brad-twinkl/controlnet-union-sdxl-1.0-promax", torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[controlnet, controlnet],
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
    ),
    torch_dtype=torch.float16,
    variant="fp16",
)

seg_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/room_seg.png")
seg_mode = 5

pose_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/pose.png")
pose_mode = 0

depth_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/stormtrooper_depth.png")
depth_mode = 1

pipe.enable_model_cpu_offload()

image = pipe(
    prompt="an astronaut in space",
    width=1024,
    height=1024,
    negative_prompt="lowres, low quality, worst quality",
    generator=torch.manual_seed(42),
    guidance_scale=5,
    num_inference_steps=50,
    control_image=[[pose_img, seg_img], [depth_img]],
    control_mode=[[pose_mode, seg_mode], [depth_mode]]
).images[0]

image.save("result_multi.jpg")

This is a dummy experience, but I'll still post the outputs here.

Single `ControlNetUnionModel`	`MultiControlNetUnionModel`

What we could easily change is assume that if a single value is passed and controlnet is MultiControlNetUnionModel, convert that single element to a list. It's a bit different than what you suggested before, but the effect would be [[pose_img, seg_img], depth_img] transforms into [[pose_img, seg_img], [depth_img]]. The same for control modes. Is this closer to that you had in mind? @vladmandic

This would produce the effect you mentioned if you intend to pass a single condition for each controlnet union. If you passed [pose_img, seg_img, depth_img] to a single ControlNetUnionModel, all the conditions would go to that controlnet. But if you had MultiControlNetUnionModel, this input assumes you have 3 ControlNetUnionModels, and each would get one condition, equivalent to [[pose_img], [seg_img], [depth_img]], which I think was what you were aiming for.

I'm happy to iterate on this, let's just define what is expected and then execute on that :)

vladmandic · 2025-02-13T15:24:01Z

i'm ok with leaving this as-is - i'll add a special case handler on my side.

elismasilva · 2025-02-13T16:55:47Z

Hi @guiyrt I know it's late to share this now, but just to show how I did it in my pipeline. Inside the denoising loop:

 # controlnet(s) inference
if guess_mode and do_classifier_free_guidance:
    # Infer ControlNet only for the conditional batch.
    control_model_input = latents
    control_model_input = self.scheduler.scale_model_input(control_model_input, t)
    controlnet_prompt_embeds = prompt_embeds.chunk(2)[1]
    controlnet_added_cond_kwargs = {
        "text_embeds": add_text_embeds.chunk(2)[1],
        "time_ids": add_time_ids.chunk(2)[1],
    }
else:
    control_model_input = latent_model_input
    controlnet_prompt_embeds = prompt_embeds
    controlnet_added_cond_kwargs = added_cond_kwargs

if union_control_type is not None:
    # controlnet union index
    # 0 -- openpose
    # 1 -- depth
    # 2 -- hed/pidi/scribble/ted
    # 3 -- canny/lineart/anime_lineart/mlsd
    # 4 -- normal
    # 5 -- segment
    # 6 -- tile
    # 7 -- repaint
    union_controlnets = {k: v for k, v in enumerate(union_control_type)}

#reset blocks variable
down_block_res_samples = None
mid_block_res_sample = None
down_block_res_samples_list, mid_block_res_sample_list = [], [] 

if isinstance(self.controlnet, MultiControlNetModel): 
    total_controlnet=len(self.controlnet.nets)                           
    for control_index in range(total_controlnet):
        #set conditioning_scale
        if isinstance(controlnet_keep[i], list):
            cond_scale = [float(c * s) for c, s in zip(controlnet_conditioning_scale, [controlnet_keep[i][control_index]] * len(controlnet_conditioning_scale))]
        else:
            controlnet_cond_scale = controlnet_conditioning_scale
            if isinstance(controlnet_cond_scale, list):
                controlnet_cond_scale = controlnet_cond_scale[0]
            cond_scale = controlnet_cond_scale * controlnet_keep[i][control_index]

        if(isinstance(self.controlnet.nets[control_index], ControlNetModel)):             
            self.controlnet.nets[control_index] = self.controlnet.nets[control_index].to(_device) 
            down_block_res_samples, mid_block_res_sample = self.controlnet.nets[control_index](
                control_model_input,
                t,
                encoder_hidden_states=controlnet_prompt_embeds,
                controlnet_cond=control_image[control_index],
                conditioning_scale=cond_scale[control_index],
                guess_mode=guess_mode,
                added_cond_kwargs=controlnet_added_cond_kwargs,
                return_dict=False)

            # controlnet mask
            if (apply_control_masks[control_index]):                                                        
                if control_mask is not None and len(control_mask) > 0:
                    down_block_res_samples, mid_block_res_sample = self.apply_mask(control_mask[control_index], _device, dtype, down_block_res_samples, mid_block_res_sample)

            down_block_res_samples_list.append(down_block_res_samples)
            mid_block_res_sample_list.append(mid_block_res_sample)
            
            if self.controlnet.nets[control_index].device != "cpu": #release memory to next controlnet
                self.controlnet.nets[control_index] = self.controlnet.nets[control_index].to("cpu") 
            
        elif (isinstance(self.controlnet.nets[control_index], ControlNetModel_Union)):
            for k, v in union_controlnets.items():                                        
                if self.controlnet.nets[control_index].device != "cuda":
                    self.controlnet.nets[control_index] = self.controlnet.nets[control_index].to(_device)                                                                                                     
                new_control_type = [0] * 8
                controlnet_cond_list = [0] * 8
                new_control_type[k] = 1
                control_type = torch.Tensor(new_control_type)
                control_type = control_type.reshape(1, -1).to(_device, dtype=dtype).repeat(batch_size * num_images_per_prompt * (3 if self.do_perturbed_attention_guidance else 2), 1)                
                controlnet_cond_list[k] = control_image[k]

                added_cond_kwargs["control_type"]=control_type
                down_block_res_samples, mid_block_res_sample = self.controlnet.nets[control_index](
                    control_model_input,
                    t,
                    encoder_hidden_states=controlnet_prompt_embeds,
                    controlnet_cond_list=controlnet_cond_list,
                    conditioning_scale=cond_scale[k],
                    guess_mode=guess_mode,
                    added_cond_kwargs=controlnet_added_cond_kwargs,
                    return_dict=False,
                )
                # controlnet mask
                control_net_union_index = k
                if (apply_control_masks[control_net_union_index]):                     
                    if control_mask is not None and len(control_mask) > 0 and control_mask[control_net_union_index] is not None:
                        down_block_res_samples, mid_block_res_sample = self.apply_mask(control_mask[control_net_union_index], _device, dtype, down_block_res_samples, mid_block_res_sample)
                down_block_res_samples_list.append(down_block_res_samples)
                mid_block_res_sample_list.append(mid_block_res_sample)
                
            if self.controlnet.nets[control_index].device != "cpu": #release memory to next controlnet
                self.controlnet.nets[control_index] = self.controlnet.nets[control_index].to("cpu") 

    if mid_block_res_sample_list:
        mid_block_res_sample = torch.stack(mid_block_res_sample_list).sum(dim=0)

    if down_block_res_samples_list:
        down_block_res_samples = [torch.stack(down_block_res_samples).sum(dim=0) 
                                for down_block_res_samples in zip(*down_block_res_samples_list)]
    
else:
    #set conditioning_scale
    if isinstance(controlnet_keep[i], list):
        cond_scale = [float(c * s) for c, s in zip(controlnet_conditioning_scale, [controlnet_keep[i][control_index]] * len(controlnet_conditioning_scale))]
    else:
        controlnet_cond_scale = controlnet_conditioning_scale
        if isinstance(controlnet_cond_scale, list):
            controlnet_cond_scale = controlnet_cond_scale[0]
        cond_scale = controlnet_cond_scale * controlnet_keep[i]

    self.controlnet.to(_device)   
    controlnet_prompt_embeds = prompt_embeds

    down_block_res_samples, mid_block_res_sample = self.controlnet(
        control_model_input,
        t,
        encoder_hidden_states=controlnet_prompt_embeds,
        controlnet_cond=control_image[0],
        conditioning_scale=cond_scale,
        guess_mode=guess_mode,
        added_cond_kwargs=controlnet_added_cond_kwargs,
        return_dict=False,
    )

    # controlnet mask
    if apply_control_masks[0]:                
        if control_mask is not None and len(control_mask) > 0:
            down_block_res_samples, mid_block_res_sample = self.apply_mask(control_mask[0], _device, dtype, down_block_res_samples, mid_block_res_sample)

if guess_mode and do_classifier_free_guidance:
    # Infered ControlNet only for the conditional batch.
    # To apply the output of ControlNet to both the unconditional and conditional batches,
    # add 0 to the unconditional batch to keep it unchanged.
    down_block_res_samples = [torch.cat([torch.zeros_like(d), d]) for d in down_block_res_samples]
    mid_block_res_sample = torch.cat([torch.zeros_like(mid_block_res_sample), mid_block_res_sample])

if prompt_image_emb_ip is not None and len(prompt_image_emb_ip) > 0: 
    added_cond_kwargs["image_embeds"] = prompt_image_emb_ip                         

#noise_predict code ....

I receive this parameters for controlnet on call method:

control_image: PipelineImageInput = None, fixed list of 8 position Images.
control_mask = None, #here is optional, but if have apply_control_masks you need send fixed list mask images of 8 position
union_control_type = None,  # here is optional, but receives List[int] #total 8, 1 or 0
apply_control_masks: List[bool] = [],
controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
control_guidance_start: Union[float, List[float]] = 0.0,
control_guidance_end: Union[float, List[float]] = 1.0,
guess_mode: bool = False,

My implementation is bases on original code from controlnet_union and in InstantId pipepline with multicontrolnet.
and i only changed this on original MultiControlNetModel class:

def __init__(self, controlnets: Union[List[ControlNetModel], Tuple[ControlNetModel], List[ControlNetModel_Union], Tuple[ControlNetModel_Union]]):
        super().__init__()
        self.nets = nn.ModuleList(controlnets)

This way i can use ControlNetUnion with ControlNetModel. In practice i only send MultiControlnetClass even if i an setting a ControlNetMode on it, But you can send a MultiControlnetClass, ControlNetUnion or ControlNetModel on pipeline contronet attribute, like this:

self.controlnet: Union[ControlNetModel, List[ControlNetModel], Tuple[ControlNetModel], MultiControlNetModel] = None

If you are not using MultiControlnetClass, the params dont need be fixed lists, only append in list in same sequence that was loaded.

john09282922 · 2025-02-13T21:49:04Z

Hi, I have issue when using ip-adapter, MultiContorlNetUnionModel does not have set_attn_processor.

thanks,

@guiyrt @hlky

hlky · 2025-02-13T21:52:04Z

@john09282922 Issues should be raised here, include a reproduction and the traceback.

john09282922 · 2025-02-18T01:52:58Z

Is it possible to utilize ip-adapter with multicontrolnet_union? might be similar structure of mulitcontrolnet... not sure how to run it with ip-adapter?

@guiyrt @hlky @yiyixuxu

asomoza · 2025-02-18T02:09:20Z

what do you mean? the pipeline has the IPAdapterMixin, you can just use it like any other SDXL pipeline with IP Adapters, there's nothing that changes in the method to use it.

john09282922 · 2025-02-18T02:16:39Z

what do you mean? the pipeline has the IPAdapterMixin, you can just use it like any other SDXL pipeline with IP Adapters, there's nothing that changes in the method to use it.

Hi, I got an issue when using inpainting model. I just changed current pipeline to inpaint pipeline, and when using multi-controlnet_union, there is set_att_process issue in multicontrolunion class. they don't have the process.

asomoza · 2025-02-18T02:22:44Z

Hi, I got an issue when using inpainting model. I just changed current pipeline to inpaint pipeline, and when using multi-controlnet_union, there is set_att_process issue in multicontrolunion class. they don't have the process.

I see, the problem is not with IP Adapters, this PR only introduces MultiControlnetUnion to the StableDiffusionXLControlNetUnionPipeline so it won't work with any other controlnet pipeline.

john09282922 · 2025-02-18T02:26:11Z

Hi, I got an issue when using inpainting model. I just changed current pipeline to inpaint pipeline, and when using multi-controlnet_union, there is set_att_process issue in multicontrolunion class. they don't have the process.

I see, the problem is not with IP Adapters, this PR only introduces MultiControlnetUnion to the StableDiffusionXLControlNetUnionPipeline so it won't work with any other controlnet pipeline.

okay, can you merge it to other controlnet pipeline?

asomoza · 2025-02-18T02:38:13Z

okay, can you merge it to other controlnet pipeline?

We leave those kind of tasks to the community if they want to do it and there's a popular need for it, it should be relatively easy and you can open a feature request with it but I don't think that people will take it because of two points:

It's redundant since controlnet union has an inpainting mode.
you get a lot better results with it than with an inpainting pipeline and model.

guiyrt added 2 commits February 7, 2025 16:53

SDXL with MultiControlNetUnionModel

9d528c0

fixing control_types and image check_inputs

1daa109

guiyrt marked this pull request as draft February 7, 2025 20:43

guiyrt marked this pull request as ready for review February 7, 2025 21:12

Docs update

090e3ca

guiyrt changed the title ~~[WIP] MultiControlNetUnionModel on SDXL~~ MultiControlNetUnionModel on SDXL Feb 7, 2025

hlky reviewed Feb 7, 2025

View reviewed changes

src/diffusers/pipelines/controlnet/pipeline_controlnet_union_sd_xl.py Show resolved Hide resolved

src/diffusers/models/controlnets/multicontrolnet_union.py Show resolved Hide resolved

src/diffusers/models/controlnets/multicontrolnet_union.py Show resolved Hide resolved

hlky requested a review from yiyixuxu February 7, 2025 22:33

guiyrt and others added 3 commits February 8, 2025 13:57

make style && make quality

00ca5a0

Update src/diffusers/models/controlnets/multicontrolnet_union.py

f465020

Co-authored-by: hlky <[email protected]>

Update src/diffusers/models/controlnets/multicontrolnet_union.py

ff0656c

Co-authored-by: hlky <[email protected]>

guiyrt added 2 commits February 8, 2025 14:45

Correction for copied from statement

7fb61dd

inplace scatter

34ab1af

guiyrt mentioned this pull request Feb 10, 2025

Comprehensive type checking for from_pretrained kwargs #10758

Merged

8 tasks

hlky approved these changes Feb 12, 2025

View reviewed changes

yiyixuxu merged commit 5105b5a into huggingface:main Feb 12, 2025
12 checks passed

hlky mentioned this pull request Feb 13, 2025

Experimental per control type scale for ControlNet Union #10723

Merged

asomoza mentioned this pull request Feb 26, 2025

ControlNet union pipeline fails on multi-model #10656

Closed

MultiControlNetUnionModel on SDXL #10747

MultiControlNetUnionModel on SDXL #10747

Conversation

guiyrt commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

ControlNetUnionModel

MultiControlNetUnionModel

Before submitting

Who can review?

Uh oh!

guiyrt commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 7, 2025

Uh oh!

guiyrt commented Feb 8, 2025

Uh oh!

john09282922 commented Feb 10, 2025

Uh oh!

yiyixuxu commented Feb 10, 2025

Uh oh!

hlky commented Feb 10, 2025

Uh oh!

guiyrt commented Feb 10, 2025

Uh oh!

guiyrt commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guiyrt commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlky commented Feb 10, 2025

Uh oh!

guiyrt commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

guiyrt commented Feb 12, 2025

Uh oh!

asomoza commented Feb 12, 2025

Uh oh!

Uh oh!

vladmandic commented Feb 12, 2025

Uh oh!

asomoza commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elismasilva commented Feb 12, 2025

Uh oh!

guiyrt commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

john09282922 commented Feb 13, 2025

Uh oh!

hlky commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guiyrt commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Single controlnet union

Multiple controlnet unions

Uh oh!

`MultiControlNetUnionModel` on SDXL #10747

`MultiControlNetUnionModel` on SDXL #10747

guiyrt commented Feb 7, 2025 •

edited

Loading

`ControlNetUnionModel`

`MultiControlNetUnionModel`

guiyrt commented Feb 7, 2025 •

edited

Loading

guiyrt commented Feb 10, 2025 •

edited

Loading

guiyrt commented Feb 10, 2025 •

edited

Loading

guiyrt commented Feb 10, 2025 •

edited

Loading

asomoza commented Feb 11, 2025 •

edited

Loading

asomoza commented Feb 12, 2025 •

edited

Loading

vladmandic commented Feb 12, 2025 •

edited

Loading

guiyrt commented Feb 13, 2025 •

edited

Loading

hlky commented Feb 13, 2025 •

edited

Loading

vladmandic commented Feb 13, 2025 •

edited

Loading

guiyrt commented Feb 13, 2025 •

edited

Loading

john09282922 commented Feb 18, 2025 •

edited

Loading

asomoza commented Feb 18, 2025 •

edited

Loading