Skip to content

Conversation

@akshan-main
Copy link

@akshan-main akshan-main commented Nov 22, 2025

What does this PR do?

Enables batch inference for QwenImageEditPlusPipeline by normalizing input tensor shapes and handling variable-length prompt embeddings.

Description

Addresses issue #12458.

I identified two blockers preventing batch inference in the current pipeline:

  1. 5D Tensor Requirement: The underlying Qwen2-VL model treats inputs as video (B, C, F, H, W). The pipeline was passing 4D tensors (B, C, H, W), causing immediate shape mismatches.

    • Fix: Added a pre-processing step to explicitly unsqueeze the frame dimension for static images when batch_size > 1.
  2. Tokenizer Batching Issues: The Qwen2VLProcessor produces variable-length embeddings for different prompts, which caused RuntimeError or IndexError when trying to batch encode them directly.

    • Fix: Refactored encode_prompt to process prompts individually in a loop, then pad the resulting embeddings to the maximum sequence length in the batch before concatenating.

This ensures robust batching for both images and prompts. I also added checks to handle the tuple vs list input ambiguity reported in the original issue.

Note on Batching Logic

To resolve the ambiguity between "Multi-Image Conditioning" and "Batch Inference", I implemented the following routing logic in encode_prompt:

  1. Single String Prompt (prompt="string"):

    • Behavior: Joint Condition. The pipeline treats all provided images as a single context for one generation task.
    • Use Case: Style transfer or merging elements from multiple reference images.
  2. List of Prompts (prompt=["s1", "s2"]):

    • Behavior: Parallel Batch. The pipeline maps images to prompts 1-to-1.
    • Use Case: Processing a dataset (e.g., editing 50 different images with 50 different instructions at once (given hardware allows for it)).

This ensures robust batching for both images and prompts. I also added checks to handle the tuple vs list input ambiguity reported in the original issue.

Fixes #12458

Before submitting

  • This PR fixes a typo or improves the docs.
  • Did you read the contributor guideline?
  • Did you read our philosophy doc?
  • Was this discussed/approved via a GitHub issue? (Issue [Qwen-image-edit] Batch Inference Issue / Feature Request #12458)
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? (Verified via reproduction script)

Who can review?

@yiyixuxu @sayakpaul @DN6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Qwen-image-edit] Batch Inference Issue / Feature Request

1 participant