follow-up refactor on lumina2 #10776

yiyixuxu · 2025-02-12T02:42:22Z

This PR:

refactor and simplify ROPE: removed all the logic related to different image sizes (we do not need to support this for inference)
for now, I switched the default for use_mask_in_transformer to be False because:
- for single prompt (most common use case), the outputs are identical and we are getting a performance gain with use_mask_in_transformer=False (see details follow-up refactor on lumina2 #10776 (comment))
- for list of prompts, mask should be used (see context Add support for lumina2 #10642 (comment)) - I think maybe we can remove this argument from pipeline, and automatically set to use mask when a list of prompts are passed (and otherwise set to be False)

HuggingFaceDocBuilderDev · 2025-02-12T02:48:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-02-13T04:58:40Z

src/diffusers/models/transformers/transformer_lumina2.py

-            encoder_hidden_states = layer(
-                encoder_hidden_states, attention_mask if use_mask_in_transformer else None, encoder_rotary_emb
-            )
+            encoder_hidden_states = layer(encoder_hidden_states, encoder_attention_mask, context_rotary_emb)


the slight difference we see in the output without the mask actually coming from here; I didn't see it has an effect in speed so I set it to always use encoder_attention_mask for the context_refiner layers,

with this ,for single-prompt, we are getting identical output for use_mask_in_transformer=True and use_mask_in_transformer=False;

testing script

# test lumina2 import torch from diffusers import Lumina2Text2ImgPipeline import itertools from pathlib import Path import shutil device ="cuda:1" branch = "refactor_lumina2" # branch = "main" params = { 'use_mask_in_transformer': [True, False], } # Generate all combinations param_combinations = list(itertools.product(*params.values())) # Create output directory (remove if exists) output_dir = Path(f"yiyi_test_6_outputs_{branch}") if output_dir.exists(): shutil.rmtree(output_dir) output_dir.mkdir(exist_ok=True) pipe = Lumina2Text2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Image-2.0", torch_dtype=torch.bfloat16).to(device) prompt = [ "focused exterior view on a living room of a limestone rock regular blocks shaped villa with sliding windows and timber screens in Provence along the cliff facing the sea, with waterfalls from the roof to the pool, designed by Zaha Hadid, with rocky textures and form, made of regular giant rock blocks stacked each other with infinity edge pool in front of it, blends in with the surrounding nature. Regular rock blocks. Giant rock blocks shaping the space. The image to capture the infinity edge profile of the pool and the flow of water going down creating a waterfall effect. Adriatic Sea. The design is sustainable and semi prefab. The photo is shot on a canon 5D mark 4", # "A capybara holding a sign that reads Hello World" ] # Run test for each combination for (mask,) in param_combinations: print(f"\nTesting combination:") print(f" use_mask_in_transformer: {mask}") # Generate image generator = torch.Generator(device=device).manual_seed(0) images = pipe( prompt=prompt, num_inference_steps=25, use_mask_in_transformer=mask, generator=generator, ).images # Save images for i, image in enumerate(images): output_path = output_dir / f"output_mask{int(mask)}_prompt{i}.png" image.save(output_path) print(f"Saved to: {output_path}")

asomoza · 2025-02-13T13:26:18Z

nice! I can confirm that I get the same image which in turn makes it better for text generation when it failed some times before without a mask.

yiyixuxu · 2025-02-13T18:06:06Z

@asomoza @a-r-r-o-w @hlky
are we ok to remove the use_mask_in_transforme argument from the pipeline (so user cannot set this anymore), and only use the mask in transformer when we need to(multiple prompts -> multiple values for cap_seq_len in transformers)

a-r-r-o-w · 2025-02-14T12:23:52Z

Sounds good @yiyixuxu

up

acbc6a5

yiyixuxu mentioned this pull request Feb 12, 2025

Add support for lumina2 #10642

Merged

yiyixuxu added 2 commits February 12, 2025 08:31

up

62c4a9f

up

c5412b9

yiyixuxu mentioned this pull request Feb 12, 2025

Lumina Image 2.0 minor issue #10782

Closed

yiyixuxu added 4 commits February 13, 2025 01:03

fix for mps

bde26d3

up

30a8008

up

79ed8c1

flip the default + always use mask in context refiner

fc3aa8c

yiyixuxu commented Feb 13, 2025

View reviewed changes

test

ef7a14b

yiyixuxu requested review from hlky, a-r-r-o-w and asomoza February 13, 2025 05:44

hlky approved these changes Feb 13, 2025

View reviewed changes

a-r-r-o-w approved these changes Feb 13, 2025

View reviewed changes

asomoza approved these changes Feb 13, 2025

View reviewed changes

remove use_mask_in_transformer

eb83338

yiyixuxu merged commit 69f919d into main Feb 15, 2025
15 checks passed

yiyixuxu deleted the lumina2-refactor branch February 15, 2025 00:57

arvnoodle mentioned this pull request Feb 18, 2025

Lumina2Transformer2DModel.forward() got an unexpected keyword argument 'use_mask_in_transformer ostris/ai-toolkit#252

Open

hvaara mentioned this pull request Jun 2, 2025

Use float32 RoPE freqs in Wan with MPS backends #11643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

follow-up refactor on lumina2 #10776

follow-up refactor on lumina2 #10776

Uh oh!

yiyixuxu commented Feb 12, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2025

Uh oh!

yiyixuxu Feb 13, 2025

Uh oh!

asomoza commented Feb 13, 2025

Uh oh!

yiyixuxu commented Feb 13, 2025

Uh oh!

a-r-r-o-w commented Feb 14, 2025

Uh oh!

Uh oh!

Uh oh!

follow-up refactor on lumina2 #10776

follow-up refactor on lumina2 #10776

Uh oh!

Conversation

yiyixuxu commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2025

Uh oh!

yiyixuxu Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

asomoza commented Feb 13, 2025

Uh oh!

yiyixuxu commented Feb 13, 2025

Uh oh!

a-r-r-o-w commented Feb 14, 2025

Uh oh!

Uh oh!

Uh oh!

yiyixuxu commented Feb 12, 2025 •

edited

Loading