feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

Dong1017 · 2025-09-17T06:53:22Z

What does this PR do?

Adds

1. QwenImage Pipelines and Required Modules

(Comparable with Diffusers Master)

a. Pipelines

mindone.diffusers.QwenImagePipeline
mindone.diffusers.QwenImageImg2ImgPipeline
mindone.diffusers.QwenImageInpaintPipeline
mindone.diffusers.QwenImageEditPipeline
mindone.diffusers.QwenImageEditInpaintPipeline

b. Modules

mindone.diffusers.models.AutoencoderQwenImage
mindone.diffusers.models.QwenImageTransformer2DModel
mindone.diffusers.loaders.QwenImageLoraLoaderMixin

2. add UTs of pipelines — all passed

(targeting Diffusers Master)

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage.py
tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_img2img.py
tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_inpaint.py
tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_edit.py

Usage

QwenImagePipeline

import mindspore as ms 
from mindone.diffusers import QwenImagePipeline 

pipe = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=ms.bfloat16) 
prompt = "A cat holding a sign that says hello world" 
# Depending on the variant being used, the pipeline call will slightly vary. 
# Refer to the pipeline documentation for more details. 
image = pipe(prompt, num_inference_steps=50)[0][0] 
image.save("qwenimage.png")

QwenImageImg2ImgPipeline

import mindspore as ms 
from mindone.diffusers import QwenImageImg2ImgPipeline
from mindone.diffusers.utils import load_image

pipe = QwenImageImg2ImgPipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=mindspore.bfloat16)
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney"
images = pipe(prompt=prompt, negative_prompt=" ", image=init_image, strength=0.95)[0][0]
images.save("qwenimage_img2img.png")

QwenImageInpaintPipeline

import mindspore as ms 
from mindone.diffusers import QwenImageInpaintPipeline 
from mindone.diffusers.utils import load_image 

pipe = QwenImageInpaintPipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=ms.bfloat16) 
prompt = "Face of a yellow cat, high resolution, sitting on a park bench" 
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" 
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" 
source = load_image(img_url) 
mask = load_image(mask_url) 
image = pipe(prompt=prompt, negative_prompt=" ", image=source, mask_image=mask, strength=0.85)[0][0] 
image.save("qwenimage_inpainting.png")

QwenImageEditPipeline

import mindspore as ms 
from PIL import Image 
from mindone.diffusers import QwenImageEditPipeline 
from mindone.diffusers.utils import load_image 

pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", mindspore_dtype=ms.bfloat16) 
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png").convert("RGB") 
prompt = ("Make Pikachu hold a sign that says 'Qwen Edit is awesome', yarn art style, detailed, vibrant colors") 
# Depending on the variant being used, the pipeline call will slightly vary. 
# Refer to the pipeline documentation for more details. 
image = pipe(image, prompt, num_inference_steps=50)[0][0] 
image.save("qwenimage_edit.png")

QwenImageEditInpaintPipeline

import mindspore as ms 
from PIL import Image
from mindone.diffusers import QwenImageEditInpaintPipeline
from mindone.diffusers.utils import load_image

pipe = QwenImageEditInpaintPipeline.from_pretrained("Qwen/Qwen-Image-Edit", mindspore_dtype=mindspore.bfloat16)
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
source = load_image(img_url)
mask = load_image(mask_url)
image = pipe(prompt=prompt, negative_prompt=" ", image=source, mask_image=mask, strength=1.0, num_inference_steps=50)[0][0]
image.save("qwenimage_inpainting.png")

Performance

(Infer experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.0, setting to pynative mode)

Pipeline	Weight Loading Time	Mode	Speed
QwenImagePipeline	15m21s	Pynative	9.93 s/it
QwenImageImg2ImgPipeline	14m57s	Pynative	9.56 s/it
QwenImageInpaintPipeline	10m10s	Pynative	4.80 s/it
QwenImageEditPipeline	13m57s	Pynative	13.25 s/it
QwenImageEditInpaintPipeline	13m20s	Pynative	13.98 s/it

Limitation

QwenImageEditPipeline and QwenImageEditInpaintPipeline will load modules from Qwen-Image-Edit. The use of these two pipes requires manually changing image_processor_type from Qwen2VLImageProcessorFast to Qwen2VLImageProcessor in Qwen-Image-Edit/processor/preprocessor_config.json

Notes

require transformers==4.52.1
The produced pictures are nearly identical to those by Torch, when setting consistent random seed and hidden states from the text encoder.
TODO: jit mode; LORA test; UTs of modules

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx

clean no-use notes

add model seamless_m4t

gemini-code-assist

Summary of Changes

Hello @Dong1017, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the QwenImage model family into the Diffusers library, providing a robust set of pipelines for various image generation and editing tasks. It introduces specialized model architectures and utilities tailored for QwenImage's unique features, enhancing the library's capabilities for advanced diffusion models.

Highlights

New QwenImage Integration: Comprehensive support for QwenImage models has been added, including its unique VAE and Transformer architectures, enabling advanced image generation capabilities.
Diverse Pipeline Support: Five distinct pipelines for QwenImage have been introduced: text-to-image, image-to-image, inpainting, editing, and combined edit-inpainting, offering a wide range of functionalities.
Custom Model Components: Specialized components like AutoencoderKLQwenImage with causal 3D convolutions and QwenImageTransformer2DModel featuring dual-stream attention and rotary embeddings have been implemented.
LoRA Loading Capability: The QwenImageLoraLoaderMixin and associated conversion utilities were added to enable seamless loading of LoRA checkpoints for QwenImage models.
MindSpore Compatibility Enhancements: Core utilities related to attention processing, weight normalization, and model loading have been updated to ensure full compatibility and optimized performance within the MindSpore framework.
Comprehensive Unit Testing: New unit tests have been included for all newly added QwenImage pipelines, ensuring their correctness and stability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for QwenImage pipelines and their associated modules into the Diffusers library. The changes are extensive, including new pipelines for text-to-image, image-to-image, inpainting, and editing, as well as the necessary model components like AutoencoderKLQwenImage and QwenImageTransformer2DModel. My review has identified several critical syntax errors in the newly added test files that will prevent them from running. Additionally, there are some documentation errors, including incorrect model identifiers and type hints, and a notable performance issue in the blending logic within the VAE model that could be significantly improved with vectorization. Addressing these points will enhance the correctness, usability, and performance of this new feature.

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage.py

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_edit.py

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_img2img.py

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_inpaint.py

docs/diffusers/api/models/autoencoderkl_qwenimage.md

docs/diffusers/api/models/qwenimage_transformer2d.md

mindone/diffusers/loaders/lora_pipeline.py

mindone/diffusers/models/transformers/transformer_qwenimage.py

docs/diffusers/api/models/autoencoderkl_qwenimage.md

docs/diffusers/api/pipelines/qwenimage.md

mindone/diffusers/loaders/__init__.py

mindone/diffusers/loaders/lora_pipeline.py

Cui-yshoho · 2025-09-18T02:15:42Z

mindone/diffusers/loaders/lora_pipeline.py

+            adapter_name=adapter_name,
+            metadata=metadata,
+            _pipeline=_pipeline,
+            low_cpu_mem_usage=low_cpu_mem_usage,


可以先删掉我们暂时还不支持

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

Cui-yshoho · 2025-09-18T02:23:12Z

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

+        )
+        encoder_hidden_states = self.text_encoder(
+            input_ids=ms.Tensor(txt_tokens.input_ids),
+            attention_mask=ms.Tensor(txt_tokens.attention_mask),


tensor使用优先ms.tensor 后面应该还有一些类似Tensor的问题

其他pipeline也将进行对应修改

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

Cui-yshoho · 2025-09-18T02:24:53Z

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_img2img.py

+        pt_pipe = pt_pipe.to(pt_dtype)
+        ms_pipe = ms_pipe.to(ms_dtype)
+
+        sys.modules[ms_pipe.__module__].randn_tensor = randn_tensor


为什么这里要单独替换呀？

img2img pipe中单独替换以保证随机性一致；text2img, inpaint, edit pipes中无需单独设置，将删除冗余行

tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_edit.py

SamitHuang · 2025-09-21T11:31:33Z

How to fix the requirement of transformers==4.52.1?

SamitHuang · 2025-09-21T11:45:48Z

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

+            )
+            latents = latents / latents_std + latents_mean
+            # TODO: we use pynative mode here since cache in vae.decode which not supported in graph mode
+            with pynative_context():


Is it necessary? since the default MS mode and the supported mode is pynative?

Thanks for the review. with pynative_context() will be removed, since the double-check shows that its impact on results is minimal. Tests for all pipelines are doing.

SamitHuang · 2025-09-21T11:46:20Z

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

+            [CLIPTokenizer](https://huggingface.co/docs/transformers/en/model_doc/clip#transformers.CLIPTokenizer).
+    """
+
+    model_cpu_offload_seq = "text_encoder->transformer->vae"


Is the offloading seq supported?

SamitHuang · 2025-09-21T11:57:31Z

Can add an inference example and lora fine-tune example in examples folder, which helps introduce QwenImage

Dong1017 added 30 commits August 15, 2025 14:14

2025/08/15

d3dec44

2025/8/15 17:18 revised

6006960

2025/8/18 10:22 revised

15bc8ae

2025/8/18 17:00 revised

103db50

2025/8/18 19:08 revised

77779b5

2025/8/18 19:13 revised

0cab22b

2025/8/19 9:02 revised

2b7b4c9

2025/8/19 9:04 revised

d7eaa37

2025/8/19 9:12 revised

dddd8f2

2025/8/19 10:27 revised

3117bdc

2025/8/20 9:22 revised

e19c2e3

2025/8/20 9:247 revised

0fb127a

2025/8/20 9:48 revised

5d317bc

2025/8/20 9:52 revised

e8043d8

2025/8/20 10:15 revised

b78ef0a

2025/8/20 10:50 revised

656acce

2025/8/20 11:11 revised

9a33d83

2025/8/20 11:27 revised

c2f972c

2025/8/20 11:47 revised

9e2cccf

2025/8/20 14:25 revised

9b5be21

2025/8/20 14:26 revised

1906919

2025/8/21 15:20 revised

c3055ba

2025/8/21 15:24 revised

e025800

2025/8/21 17:08 revised

436ebf3

2025/8/21 17:57 revised

dafec1a

2025/8/21 19:13 revised

e573be1

2025/8/22 11:32 revised

d549ab2

2025/8/22 17:40 revised

09ac0bd

2025/8/25 10:40 revised

fb5877b

2025/8/26 10:30 revised

358b20b

Dong1017 and others added 12 commits September 5, 2025 15:08

2025/9/5 15:07, edit-inpaint pipe

39168ee

2025/9/5 17:40, fix some bugs

fdfb3a3

modified qwenimage 2025/9/15

97b3d8e

clean no-use notes

2025/9/15 seamless_m4t submit

440e226

add model seamless_m4t

2025/9/17 seamless_m4t ut

2e060cb

25/9/17 seamless_m4t clean

a2a52b2

2025/9/17 qwenimage clean

74c91f3

fix: remove unwanted files

9ec46f4

fix: remove unwanted files

69c6f6a

fix: keep file consistent

ec98ccb

fix: keep file consistent

c2c61d3

fix: keep file consistent

ff03c7a

Dong1017 requested a review from vigo999 as a code owner September 17, 2025 06:53

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

Dong1017 changed the title ~~Add pipelines and required modules of QwenImage in Diffusers Master~~ feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master Sep 17, 2025

Dong1017 added 8 commits September 17, 2025 17:44

revised according to gemini

8faa7b3

revised according to gemini

f77b668

fix conflicting according to gemini

ab08ce5

fix conflicting according to gemini

a19fbe5

Merge branch 'master' into qwenimage

ab566b3

required lines but conflicting

9ce5c35

Merge branch 'qwenimage' of github.com:Dong1017/mindone into qwenimage

854ac0c

required lines but conflicting

f770d50

Cui-yshoho reviewed Sep 18, 2025

View reviewed changes

fix: md, according to Cui-yshoho

7d5da80

SamitHuang reviewed Sep 21, 2025

View reviewed changes

SamitHuang mentioned this pull request Sep 22, 2025

Update readme #1298

Open

6 tasks

feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

Are you sure you want to change the base?

feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

Uh oh!

Conversation

Dong1017 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Adds

1. QwenImage Pipelines and Required Modules

a. Pipelines

b. Modules

2. add UTs of pipelines — all passed

Usage

Performance

Limitation

Notes

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cui-yshoho Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cui-yshoho Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Dong1017 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cui-yshoho Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Dong1017 Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SamitHuang commented Sep 21, 2025

Uh oh!

SamitHuang Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

Dong1017 Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamitHuang Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

SamitHuang commented Sep 21, 2025

Uh oh!

Uh oh!

Dong1017 commented Sep 17, 2025 •

edited

Loading

Dong1017 Sep 25, 2025 •

edited

Loading

Dong1017 Sep 25, 2025 •

edited

Loading