Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableDiffusion3 pipeline RuntimeError when using prompt_embeds #10712

Open
pjjajal opened this issue Feb 4, 2025 · 2 comments
Open

StableDiffusion3 pipeline RuntimeError when using prompt_embeds #10712

pjjajal opened this issue Feb 4, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@pjjajal
Copy link

pjjajal commented Feb 4, 2025

Describe the bug

StableDiffusion3 pipeline throws a RuntimeError when using prompt_embeds in lieu of prompt when using num_images_per_prompt > 1.

I am attempting to generate images using the StableDiffusion3 pipeline with some precomputed prompt embeddings. The prompt embeddings using the .encode_prompt(...) method of the pipeline and are passed to the call of the pipeline. Passing these encoded prompts to the pipeline leads to a Runtime error when:

  • num_images_per_prompt >1 for both the .encode_prompt(...) and the __call__(...).
  • num_images_per_prompt=1 for .encode_prompt(...) and num_images_per_prompt >1 for the __call__(...).

The StableDiffusionXL pipeline does not have these errors.

Reproduction

StableDiffusion3 Failing Cases

encode_prompt num_images_per_prompt>1 and call num_images_per_prompt>1

The code for this failing case is below:

import torch
from diffusers import DiffusionPipeline

model_name = "stabilityai/stable-diffusion-3.5-medium"
pipe = DiffusionPipeline.from_pretrained(
    model_name, torch_dtype=torch.float16
).to("cuda")

# encode the prompts
(
    prompt_embeds,
    negative_prompt_embeds,
    pooled_prompt_embeds,
    negative_pooled_prompt_embeds,
) = pipe.encode_prompt(
    prompt="A painting of a cat",
    prompt_2=None,
    prompt_3=None,
    device="cuda",
    do_classifier_free_guidance=True,
    num_images_per_prompt=2, # NOTE
)

# sample (generate) from the diffusion model.
with torch.inference_mode():
    out = pipe(
        height=64, # this is set small for speeding up testing
        width=64, # this is set small for speeding up testing
        num_images_per_prompt=2,  # NOTE
        num_inference_steps=1,
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds=negative_prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
        generator=torch.Generator(0)
    )

encode_prompt num_images_per_prompt=1 and call num_images_per_prompt>1

import torch
from diffusers import DiffusionPipeline

model_name = "stabilityai/stable-diffusion-3.5-medium"
pipe = DiffusionPipeline.from_pretrained(
    model_name, torch_dtype=torch.float16
).to("cuda")

# encode the prompts
(
    prompt_embeds,
    negative_prompt_embeds,
    pooled_prompt_embeds,
    negative_pooled_prompt_embeds,
) = pipe.encode_prompt(
    prompt="A painting of a cat",
    prompt_2=None,
    prompt_3=None,
    device="cuda",
    do_classifier_free_guidance=True,
    num_images_per_prompt=1, # NOTE
)

# sample (generate) from the diffusion model.
with torch.inference_mode():
    out = pipe(
        height=64, # this is set small for speeding up testing
        width=64, # this is set small for speeding up testing
        num_images_per_prompt=2,  # NOTE
        num_inference_steps=1,
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds=negative_prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
        generator=torch.Generator(0)
    )

Expected Behaviour (StableDiffusionXL pipeline)

encode_prompt num_images_per_prompt=1 and call num_images_per_prompt>1

import torch
from diffusers import DiffusionPipeline

model_name = "stabilityai/sdxl-turbo"
pipe = DiffusionPipeline.from_pretrained(
    model_name, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

(
    prompt_embeds,
    negative_prompt_embeds,
    pooled_prompt_embeds,
    negative_pooled_prompt_embeds,
) = pipe.encode_prompt(
    prompt="A painting of a cat",
    device="cuda",
    do_classifier_free_guidance=True,
    num_images_per_prompt=1,
)

with torch.inference_mode():
    out = pipe(
        # prompt="Cat",
        height=64,
        width=64,
        num_images_per_prompt=2,
        num_inference_steps=1,
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds=negative_prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
        generator=torch.Generator(0)
    )
print(len(out.images)) # this returns 2.

encode_prompt num_images_per_prompt>1 and call num_images_per_prompt>1

import torch
from diffusers import DiffusionPipeline

model_name = "stabilityai/sdxl-turbo"
pipe = DiffusionPipeline.from_pretrained(
    model_name, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

(
    prompt_embeds,
    negative_prompt_embeds,
    pooled_prompt_embeds,
    negative_pooled_prompt_embeds,
) = pipe.encode_prompt(
    prompt="A painting of a cat",
    device="cuda",
    do_classifier_free_guidance=True,
    num_images_per_prompt=2,
)

with torch.inference_mode():
    out = pipe(
        # prompt="Cat",
        height=64,
        width=64,
        num_images_per_prompt=2,
        num_inference_steps=1,
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds=negative_prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
        generator=torch.Generator(0)
    )
print(len(out.images)) # this returns 4.

Logs

Traceback (most recent call last):
  File "/home/jajal/research/diffusion-trajectory/sd3.py", line 26, in <module>
    out = pipe(
          ^^^^^
  File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jajal/research/diffusers/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 1060, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jajal/research/diffusers/src/diffusers/models/transformers/transformer_sd3.py", line 389, in forward
    temb = self.time_text_embed(timestep, pooled_projections)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jajal/research/diffusers/src/diffusers/models/embeddings.py", line 1606, in forward
    conditioning = timesteps_emb + pooled_projections
                   ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (8) must match the size of tensor b (4) at non-singleton dimension 0

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Running on Google Colab?: No
  • Python version: 3.12.8
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.28.1
  • Transformers version: 4.48.2
  • Accelerate version: 1.3.0
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.2
  • xFormers version: 0.0.29.post2
  • Accelerator: NVIDIA GeForce RTX 4070 Ti, 12282 MiB
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @sayakpaul

@pjjajal pjjajal added the bug Something isn't working label Feb 4, 2025
@sayakpaul
Copy link
Member

Cc: @yiyixuxu

Copy link
Contributor

github-actions bot commented Mar 6, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 6, 2025
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Mar 6, 2025
@yiyixuxu yiyixuxu self-assigned this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants