Complete Kohya LoRA conversion for Qwen and Z-Image#14080
Conversation
…erscores _convert_non_diffusers_z_image_lora_to_diffusers reverses Kohya's `.`->`_` flattening with a blanket `_`->`.` split, guarded only by a small protected-n-gram list (attention to_q/k/v/out, feed_forward) plus post-hoc fixes for context_refiner/noise_refiner. Z-Image's other modules whose names contain underscores were over-split: all_final_layer, all_x_embedder, adaLN_modulation, cap_embedder and t_embedder came out as all.final.layer, adaLN.modulation, ... and failed to load with "unexpected keys". Extend the existing dot->underscore post-normalization to re-merge these names, so Kohya (lora_unet_) Z-Image LoRAs load. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_convert_non_diffusers_qwen_lora_to_diffusers's convert_key hardcodes the transformer_blocks prefix and assumes every lora_unet_ key lives under a block: it strips a transformer_blocks_ prefix and re-prepends transformer_blocks., which collapses the top-level modules (img_in, txt_in, proj_out, norm_out.linear, time_text_embed.timestep_embedder.linear_1/2) onto each other. They end up as transformer_blocks..weight / ...a.down.weight and trip the 'state_dict should be empty' guard. Resolve these six modules via an explicit flattened->dotted map before the block logic runs, preserving the .lora_down/.lora_up/.alpha suffix, so Kohya (lora_unet_) Qwen LoRAs load. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Hi @dxqb, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. |
I could open an issue, but it would just repeat what the PR summary already says. Please add |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
This PR adds to the already implement kohya-to-diffusers conversion code some missing layers.
These are layers that mostly live outside the transformer blocks, but also one inside the transformer block.
I guess these weren't initially included because kohya-ss/musubi-tuner doesn't train them by default, but you can train them, with kohya-ss/musubi-tuner and other trainers that output the kohya format.
Details:
Qwen — top-level (non-block) modules
convert_keyassumes every key lives undertransformer_blocksand strips/re-prependsthat prefix. The six top-level modules (
img_in,txt_in,proj_out,norm_out.linear,time_text_embed.timestep_embedder.linear_1/2) collapse onto eachother and trip the
state_dict should be emptycheck. They're now resolved via anexplicit flattened→dotted map before the block logic, preserving the
.lora_down/.lora_up/.alphasuffix.Z-Image — module names that contain underscores
The blanket
_→.split over-splits modules whose own names contain underscores(
all_final_layer,all_x_embedder,adaLN_modulation,cap_embedder,t_embedder),so they arrive as
all.final.layer,adaLN.modulation, … and fail with "unexpectedkeys". The existing dotted→underscore post-normalization is extended to re-merge these
names (it runs on the full key, so
.lora_A/Band.alphaare handled alike).Who can review?