Feature: Add Z-Image-Turbo model support #8671

Pfannkuchensack · 2025-11-30T23:28:39Z

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including:

Backend:

New BaseModelType.ZImage in taxonomy
Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig)
Model loader for Z-Image transformer and Qwen3 text encoder
Z-Image conditioning data structures
Step callback support for Z-Image with FLUX latent RGB factors

Invocations:

z_image_model_loader: Load Z-Image transformer and Qwen3 encoder
z_image_text_encoder: Encode prompts using Qwen3 with chat template
z_image_denoise: Flow matching denoising with time-shifted sigmas
z_image_image_to_latents: Encode images to 16-channel latents
z_image_latents_to_image: Decode latents using FLUX VAE

Frontend:

Z-Image graph builder for text-to-image generation
Model picker and validation updates for z-image base type
CFG scale now allows 0 (required for Z-Image-Turbo)
Clip skip disabled for Z-Image (uses Qwen3, not CLIP)
Optimal dimension settings for Z-Image (1024x1024)

Technical details:

Uses Qwen3 text encoder (not CLIP/T5)
16 latent channels with FLUX-compatible VAE
Flow matching scheduler with dynamic time shift
8 inference steps recommended for Turbo variant
bfloat16 inference dtype

Summary

Related Issues / Discussions

QA Instructions

Install a Z-Image-Turbo model (e.g., from HuggingFace)
Select the model in the Model Picker
Generate a text-to-image with:
CFG Scale: 0
Steps: 8
Resolution: 1024x1024
Verify the generated image is coherent (not noise)

Merge Plan

Standard merge, no special considerations needed.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including: Backend: - New BaseModelType.ZImage in taxonomy - Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig) - Model loader for Z-Image transformer and Qwen3 text encoder - Z-Image conditioning data structures - Step callback support for Z-Image with FLUX latent RGB factors Invocations: - z_image_model_loader: Load Z-Image transformer and Qwen3 encoder - z_image_text_encoder: Encode prompts using Qwen3 with chat template - z_image_denoise: Flow matching denoising with time-shifted sigmas - z_image_image_to_latents: Encode images to 16-channel latents - z_image_latents_to_image: Decode latents using FLUX VAE Frontend: - Z-Image graph builder for text-to-image generation - Model picker and validation updates for z-image base type - CFG scale now allows 0 (required for Z-Image-Turbo) - Clip skip disabled for Z-Image (uses Qwen3, not CLIP) - Optimal dimension settings for Z-Image (1024x1024) Technical details: - Uses Qwen3 text encoder (not CLIP/T5) - 16 latent channels with FLUX-compatible VAE - Flow matching scheduler with dynamic time shift - 8 inference steps recommended for Turbo variant - bfloat16 inference dtype

Add comprehensive LoRA support for Z-Image models including: Backend: - New Z-Image LoRA config classes (LoRA_LyCORIS_ZImage_Config, LoRA_Diffusers_ZImage_Config) - Z-Image LoRA conversion utilities with key mapping for transformer and Qwen3 encoder - LoRA prefix constants (Z_IMAGE_LORA_TRANSFORMER_PREFIX, Z_IMAGE_LORA_QWEN3_PREFIX) - LoRA detection logic to distinguish Z-Image from Flux models - Layer patcher improvements for proper dtype conversion and parameter

lstein · 2025-12-02T01:55:31Z

Very impressive. The model is working with acceptable performance even on my 12 GB RAM card.

I notice the following message in the error log:

[2025-12-01 20:50:58,822]::[ModelManagerService]::WARNING --> [MODEL CACHE] Failed to calculate model size for unexpected model type: <class 'transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer'>. The model will be treated as having size 0.

Would it be possible to add support for the quantized models, e.g. T5B/Z-Image-Turbo-FP8 or jayn7/Z-Image-Turbo-GGUF ?

Pfannkuchensack · 2025-12-02T02:01:19Z

I'll take a look at it and report back.

lstein · 2025-12-02T02:35:52Z

I tried two huggingface LoRAs that claim to be based on z-image, but they were detected as Flux lycoris models:

reverentelusarca/elusarca-anime-style-lora-z-image-turbo
tarn59/pixel_art_style_lora_z_image_turbo

…ntification Move Flux layer structure check before metadata check to prevent misidentifying Z-Image LoRAs (which use `diffusion_model.layers.X`) as Flux AI Toolkit format. Flux models use `double_blocks` and `single_blocks` patterns which are now checked first regardless of metadata presence.

…ibility Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility: Backend: - New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers - Z-Image key detection (_has_z_image_keys) to identify S3-DiT models - GGUF quantization detection and sidecar LoRA patching for quantized models - Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models Model Loader: - Split Z-Image model

Pfannkuchensack · 2025-12-03T01:26:02Z

I did tried both of the Lora and both of them get imported as z-images lora.

…kuchensack/InvokeAI into feat/z-image-turbo-support

lstein · 2025-12-05T01:41:28Z

When running upscaling, diffusers 0.36.0.dev0 dies because the diffusers.models.controlnet module has been renamed to diffusers.models.controlnets.controlnet. I suggest applying this patch to fix the issue:

diff --git a/invokeai/backend/util/hotfixes.py b/invokeai/backend/util/hotfixes.py
index 7e258b8779..1609fe12c4 100644
--- a/invokeai/backend/util/hotfixes.py
+++ b/invokeai/backend/util/hotfixes.py
@@ -5,7 +5,6 @@ import torch
 from diffusers.configuration_utils import ConfigMixin, register_to_config
 from diffusers.loaders.single_file_model import FromOriginalModelMixin
 from diffusers.models.attention_processor import AttentionProcessor, AttnProcessor
-from diffusers.models.controlnet import ControlNetConditioningEmbedding, ControlNetOutput, zero_module
 from diffusers.models.embeddings import (
     TextImageProjection,
     TextImageTimeEmbedding,
@@ -13,6 +12,7 @@ from diffusers.models.embeddings import (
     TimestepEmbedding,
     Timesteps,
 )
+from diffusers.models.controlnets.controlnet import ControlNetConditioningEmbedding, ControlNetOutput, zero_module
 from diffusers.models.modeling_utils import ModelMixin
 from diffusers.models.unets.unet_2d_blocks import (
     CrossAttnDownBlock2D,
@@ -777,7 +777,7 @@ class ControlNetModel(ModelMixin, ConfigMixin, FromOriginalModelMixin):
 
 
 diffusers.ControlNetModel = ControlNetModel
-diffusers.models.controlnet.ControlNetModel = ControlNetModel
+diffusers.models.controlnets.controlnet.ControlNetModel = ControlNetModel

blessedcoolant · 2025-12-08T20:47:43Z

Think this needs support for loading in the repackaged safetensors versions of the models that people use with Comfy - the default fp16 version and the fp8 model. People will likely try to load those model files as the transformer and also as the text encoder and share between the two programs.

lstein · 2025-12-08T22:36:41Z

I've tested multiple LoRAs and they import and work correctly.

lstein

I asked Copilot (GPT-5 mini) to identify edge cases and potential security issues, and here are its high-priority findings. Let me know if you find this annoying. If not, I can ask Copilot to generate a PR with proposed fixes.

High-priority findings

Z-Image inference assumes bfloat16 (torch.bfloat16) everywhere

Affected files: invokeai/app/invocations/z_image_denoise.py, invokeai/backend/model_manager/load/model_loaders/z_image.py, invokeai/backend/quantization/gguf/ggml_tensor.py, others.
Issue: inference_dtype is hard-coded to torch.bfloat16. Not all devices and PyTorch builds support bfloat16 (especially many CUDA GPUs and CPU builds). This will raise errors or produce incorrect results on unsupported hardware.
Impact: runtime failures on many GPUs/CPUs; user-facing crashes when selecting Z-Image models.

Recommended fix:

Detect device and dtype capability and choose a safe fallback (torch.float16 or torch.float32) when bfloat16 is not supported.

Example pattern:

device = TorchDevice.choose_torch_device()
if device.type == "cuda" and torch.cuda.is_bf16_supported():
    inference_dtype = torch.bfloat16
elif device.type == "cuda":
    inference_dtype = torch.float16
else:
    inference_dtype = torch.float32

Add a clear warning in logs when falling back.

LoRA parameter application not device-aware — possible CPU/CUDA dtype/device mismatch

Affected file: invokeai/backend/patches/layer_patcher.py
Issue: when applying LoRA parameter tensors (param_weight) the code converts dtype but does not ensure the tensor is on the same device as the model parameter (module_param). Adding a CPU tensor to a CUDA tensor (or vice versa) will raise.
Impact: runtime errors when applying LoRAs to models on GPU; silent failures or incorrect behavior if tensors are not moved correctly.
Recommended fix:
- Move and cast param_weight to module_param's device before arithmetic:
```
param_weight_converted = param_weight.to(dtype=dtype, device=module_param.device)
module_param.data.add_(param_weight_converted)  # or copy_ as appropriate
```
- Prefer in-place add_ to preserve Parameter metadata; ensure dtype/device compatibility.

Unsafe use of assert(...) for runtime input/format validation

Affected files: many (examples: z_image_denoise.py: len(cond_data.conditionings) == 1 and isinstance checks; z_image_text_encoder.py hidden_state indexing).
Issue: assert statements are used to validate user-provided data and external resources. Python asserts can be disabled and they raise generic AssertionError with poor messages.
Impact: unclear error messages or suppressed checks if assertions are stripped; poor debugging experience for users.
Recommended fix:
- Replace asserts with explicit exceptions and descriptive messages (ValueError, TypeError) so errors are always raised and informative:
```
if len(cond_data.conditionings) != 1:
    raise ValueError("expected exactly 1 conditioning entry for Z-Image, got ...")
```

Tokenizer / text-encoder assumptions may crash on incompatible transformers/tokenizers

Affected file: invokeai/app/invocations/z_image_text_encoder.py
Issues:
- Assumes tokenizer has apply_chat_template(...) — not guaranteed for all tokenizer versions.
- Assumes outputs.hidden_states contains at least 2 entries and blindly uses outputs.hidden_states[-2]; can IndexError.
- Prompts may be truncated to zero tokens; no explicit handling.
Impact: runtime exceptions when encoding prompts with different tokenizer/text-encoder versions; poor UX for users.
Recommended fix:
- Guard for apply_chat_template existence and fallback to simple formatting if absent.
- Validate hidden_states length; if not present, fall back to outputs.last_hidden_state.
- Check that the masked prompt has at least one token after truncation; raise a clear error if empty.

GGUF -> diffusers key conversion and model loading heuristics are fragile

Affected file: invokeai/backend/model_manager/load/model_loaders/z_image.py
Issues:
- _convert_z_image_gguf_to_diffusers performs heuristic key remapping which may not cover all GGUF layouts.
- The loader assumes bfloat16 and specific model config parameters (dim/heads/etc.) without robust validation.
- Lack of clear, user-friendly errors when conversion/mapping fails.
Impact: model load failures or silently corrupted model weights if mapping is incorrect.
Recommended fix:
- Add strong validation after conversion (check that expected critical keys exist).
- Provide informative errors if the converted state dict doesn't match expected shape/keys.
- Consider unit tests and sample models to validate mapping logic.

Random noise generation uses float16 on CPU — portability risk

Affected file: invokeai/app/invocations/z_image_denoise.py
Issue: noise is generated on rand_device = "cpu" with rand_dtype = torch.float16. Some PyTorch CPU builds have limited float16 support and creating float16 tensors on CPU may be unsupported or slow.
Impact: crashes or wrong dtype behavior on CPUs.
Recommended fix:
- Generate noise as float32 on CPU then cast to target dtype/device, or generate directly on target device in a supported dtype:
```
noise = torch.randn(..., dtype=torch.float32, device="cpu")
noise = noise.to(device=device, dtype=inference_dtype)
```

Sensitive/large tensor logging and debug prints

Affected files: invokeai/backend/patches/lora_conversions/z_image_lora_conversion_utils.py (logging of tensor values)
Issue: large tensors may be logged directly (e.g., logging first_alpha_val). This can bloat logs and leak data. Logging entire tensors is noisy and potentially sensitive.
Impact: large log files and possible information leakage.

Recommended fix:

Log shapes or scalar.item() when appropriate, not full tensors:

logger.info("First alpha: key=%s, shape=%s, value=%s", key, tensor.shape, float(tensor.item()) if tensor.numel()==1 else "<non-scalar>")

Pfannkuchensack · 2025-12-09T04:50:02Z

First i try to get the comfy versions to work, then i look into the edge cases/potential security issues.

Add support for loading Z-Image transformer and Qwen3 encoder models from single-file safetensors format (in addition to existing diffusers directory format). Changes: - Add Main_Checkpoint_ZImage_Config and Main_GGUF_ZImage_Config for single-file Z-Image transformer models - Add Qwen3Encoder_Checkpoint_Config for single-file Qwen3 text encoder - Add ZImageCheckpointModel and ZImageGGUFCheckpointModel loaders with automatic key conversion from original to diffusers format - Add Qwen3EncoderCheckpointLoader using Qwen3ForCausalLM with fast loading via init_empty_weights and proper weight tying for lm_head - Update z_image_denoise to accept Checkpoint format models

Add support for saving and recalling Z-Image component models (VAE and Qwen3 Encoder) in image metadata. Backend: - Add qwen3_encoder field to CoreMetadataInvocation (version 2.1.0) Frontend: - Add vae and qwen3_encoder to Z-Image graph metadata - Add Qwen3EncoderModel metadata handler for recall - Add ZImageVAEModel metadata handler (uses zImageVaeModelSelected instead of vaeSelected to set Z-Image-specific VAE state) - Add qwen3Encoder translation key This enables "Recall Parameters" / "Remix Image" to restore the VAE and Qwen3 Encoder settings used for Z-Image generations.

Pfannkuchensack · 2025-12-09T06:14:34Z

I used the models from https://comfyanonymous.github.io/ComfyUI_examples/z_image/ for testing.

Add robust device capability detection for bfloat16, replacing hardcoded dtype with runtime checks that fallback to float16/float32 on unsupported hardware. This prevents runtime failures on GPUs and CPUs without bfloat16. Key changes: - Add TorchDevice.choose_bfloat16_safe_dtype() helper for safe dtype selection - Fix LoRA device mismatch in layer_patcher.py (add device= to .to() call) - Replace all assert statements with descriptive exceptions (TypeError/ValueError) - Add hidden_states bounds check and apply_chat_template fallback in text encoder - Add GGUF QKV tensor validation (divisible by 3 check) - Fix CPU noise generation to use float32 for compatibility - Remove verbose debug logging from LoRA conversion utils

…inModelConfig The FLUX Dev license warning in model pickers used isCheckpointMainModelConfig incorrectly: ``` isCheckpointMainModelConfig(config) && config.variant === 'dev' ``` This caused a TypeScript error because CheckpointModelConfig type doesn't include the 'variant' property (it's extracted as `{ type: 'main'; format: 'checkpoint' }` which doesn't narrow to include variant). Changes: - Add isFluxDevMainModelConfig type guard that properly checks base='flux' AND variant='dev', returning MainModelConfig - Update MainModelPicker and InitialStateMainModelPicker to use new guard - Remove isCheckpointMainModelConfig as it had no other usages The function was removed because: 1. It was only used for detecting FLUX Dev models (incorrect use case) 2. No other code needs a generic "is checkpoint format" check 3. The pattern in this codebase is specific type guards per model variant (isFluxFillMainModelModelConfig, isRefinerMainModelModelConfig, etc.)

blessedcoolant · 2025-12-09T12:43:47Z

I used the models from https://comfyanonymous.github.io/ComfyUI_examples/z_image/ for testing.

✔️ Nice job. The models are being detected correctly at via the model manager.
❌ The inference seems to be fine at FP16 but on the FP8 models the following error occurs. Reference FP8 model to check: https://huggingface.co/Kijai/Z-Image_comfy_fp8_scaled/tree/main

File "~\InvokeAI\invokeai\backend\model_manager\load\model_loaders\z_image.py", line 75, in _convert_z_image_gguf_to_diffusers
    raise ValueError(
ValueError: Cannot split QKV tensor 'context_refiner.0.attention.qkv.scale_weight': first dimension (1) is not divisible by 3. The model file may be corrupted or incompatible.

This is also the same issue with a model that I manually converted on my end too.

⁉️Secondary thing would be to set the default params for the Z Image model when loaded -- the recommended steps of 9, cfg to 1 and etc.
❌ Also LoRA's for z-image that I randomly pulled off Civit are being loaded as checkpoint models rather than LoRAs and manually trying to update the field is failing. So effectively cannot use them at all.
✔️ Tested the GGUF models. The base model quants are working as expected.
❌ The GGUF quants for the text encoder Qwen 4B are failing to load. https://huggingface.co/Qwen/Qwen3-4B-GGUF/tree/main
❌ There are some weird artifacts at 9 steps and 1 CFG which I believe are the recommended settings for Z Image Turbo. These are not so visible in styled images but when it comes to realism they are quite prominent.

Pfannkuchensack · 2025-12-09T14:39:14Z

this are the settings from the github page of the model. so i think cfg 1 is the problem there. but i will check.
num_inference_steps=9, # This actually results in 8 DiT forwards
guidance_scale=0.0, # Guidance should be 0 for the Turbo models
I'll take a look at the rest as well.

blessedcoolant · 2025-12-09T15:51:33Z

this are the settings from the github page of the model. so i think cfg 1 is the problem there. but i will check.
num_inference_steps=9, # This actually results in 8 DiT forwards
guidance_scale=0.0, # Guidance should be 0 for the Turbo models
I'll take a look at the rest as well.

Same issue with CFG set to 0 too. Another issue I found is that now that 0 CFG is possible, we cannot set it as the model default in the model manager. It bugs out. Needs fixing.

…ters - Add Qwen3EncoderGGUFLoader for llama.cpp GGUF quantized text encoders - Convert llama.cpp key format (blk.X., token_embd) to PyTorch format - Handle tied embeddings (lm_head.weight ↔ embed_tokens.weight) - Dequantize embed_tokens for embedding lookups (GGMLTensor limitation) - Add QK normalization key mappings (q_norm, k_norm) for Qwen3 - Set Z-Image defaults: steps=9, cfg_scale=0.0, width/height=1024 - Allow cfg_scale >= 0 (was >= 1) for Z-Image Turbo compatibility - Add GGUF format detection for Qwen3 model probing

…rNorm - Add CustomDiffusersRMSNorm for diffusers.models.normalization.RMSNorm - Add CustomLayerNorm for torch.nn.LayerNorm - Register both in AUTOCAST_MODULE_TYPE_MAPPING Enables partial loading (enable_partial_loading: true) for Z-Image models by wrapping their normalization layers with device autocast support

…dont.

github-actions bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-deps PRs that change python dependencies labels Nov 30, 2025

Pfannkuchensack added 3 commits December 1, 2025 00:30

fix windows path again.

13ac16e

Fix windows path again again

eaf4742

Pfannkuchensack added 2 commits December 2, 2025 15:50

Pfannkuchensack added 2 commits December 3, 2025 03:28

Fix windows path again again again...

66729ea

Merge branch 'main' into feat/z-image-turbo-support

9f6d04c

Pfannkuchensack marked this pull request as ready for review December 4, 2025 23:46

Pfannkuchensack requested review from blessedcoolant and lstein as code owners December 4, 2025 23:46

Pfannkuchensack added 2 commits December 5, 2025 01:12

fix for the typegen-checks

4a1710b

Merge branch 'feat/z-image-turbo-support' of https://github.com/Pfann…

b28d58b

…kuchensack/InvokeAI into feat/z-image-turbo-support

Patch from @lstein for the update of diffusers

2e0cd4d

Pfannkuchensack mentioned this pull request Dec 6, 2025

[enhancement]: Support for Z-Image Turbo #8670

Open

1 task

lstein requested a review from Copilot December 8, 2025 22:59

Copilot started reviewing on behalf of lstein December 8, 2025 22:59 View session

lstein review requested due to automatic review settings December 8, 2025 23:00

lstein reviewed Dec 8, 2025

View reviewed changes

Pfannkuchensack added 2 commits December 9, 2025 06:32

Pfannkuchensack added 3 commits December 9, 2025 07:37

fix typegen wrong

3e862ce

Pfannkuchensack added 4 commits December 10, 2025 03:07

fix typegen

8551ff8

z-image-turbo-fp8-e5m2 works. the z-image-turbo_fp8_scaled_e4m3fn_KJ …

f9605e1

…dont.

Feature: Add Z-Image-Turbo model support #8671

Are you sure you want to change the base?

Feature: Add Z-Image-Turbo model support #8671

Conversation

Pfannkuchensack commented Nov 30, 2025

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

lstein commented Dec 2, 2025

Uh oh!

Pfannkuchensack commented Dec 2, 2025

Uh oh!

lstein commented Dec 2, 2025

Uh oh!

Pfannkuchensack commented Dec 3, 2025

Uh oh!

lstein commented Dec 5, 2025

Uh oh!

blessedcoolant commented Dec 8, 2025

Uh oh!

lstein commented Dec 8, 2025

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

High-priority findings

Uh oh!

Pfannkuchensack commented Dec 9, 2025

Uh oh!

Pfannkuchensack commented Dec 9, 2025

Uh oh!

blessedcoolant commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pfannkuchensack commented Dec 9, 2025

Uh oh!

blessedcoolant commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blessedcoolant commented Dec 9, 2025 •

edited

Loading