Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SD3.5 large #107

Open
gebaltso opened this issue Oct 31, 2024 · 2 comments
Open

Support SD3.5 large #107

gebaltso opened this issue Oct 31, 2024 · 2 comments

Comments

@gebaltso
Copy link

gebaltso commented Oct 31, 2024

Hello I tried to use compel with SD3.5-large but got this error:

prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 333 but got size 77 for tensor number 1 in the list.

Code:

import torch
from diffusers import StableDiffusion3Pipeline
from compel import Compel

torch.cuda.empty_cache()

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)

prompt = "A portrait of a Latino woman, green eyes--"

prompt_embeds = compel_proc(prompt)
pooled_prompt_embeds = compel_proc(prompt)

image = pipe(
        prompt_embeds=prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        num_images_per_prompt=1,
        num_inference_steps=28,
        guidance_scale=3.5,
    ).images[0]
    
image.save(i.jpg')
@joel-simon
Copy link

+1

@Rav4567
Copy link

Rav4567 commented Jan 29, 2025

SD3 requires all available tokenizers and encoders (pipeline.tokenizer, pipeline.tokenizer_2, pipeline.tokenizer_3) for work. So you need init Compel with all:

compel = Compel( tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2, pipeline.tokenizer_3] , text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2, pipeline.text_encoder_3], returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, requires_pooled=[True, True, True], )

BUT:

SD3 pipeline.tokenizer_3 is T5TokenizerFast, Compel supports only CLIPTokenizer (see init).

For experiments, you can hack this (it's sample !)

`
compel = Compel(
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
truncate_long_prompts=True,
requires_pooled=[True, True],
)

cond_embs, con_pooled = compel.build_conditioning_tensor(prompt)
neg_embs, neg_pooled = compel.build_conditioning_tensor("")
[cond_embs, neg_embs] = compel.pad_conditioning_tensors_to_same_length(conditionings=[cond_embs, neg_embs])

# !!! hack, be carefull use it :)
cond_embs = torch.cat([cond_embs, cond_embs], -1)
neg_embs = torch.cat([neg_embs, neg_embs], -1)

images = pipeline(
    prompt_embeds=cond_embs, 
    pooled_prompt_embeds=con_pooled,
    negative_prompt_embeds=neg_embs,
    negative_pooled_prompt_embeds=neg_pooled,
    num_inference_steps=30,
    num_images_per_prompt=1,
    guidance_scale=3.5,
).images

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants