Skip to content

New model: Anima#1487

Open
dxqb wants to merge 20 commits into
Nerogar:masterfrom
dxqb:anima
Open

New model: Anima#1487
dxqb wants to merge 20 commits into
Nerogar:masterfrom
dxqb:anima

Conversation

@dxqb

@dxqb dxqb commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Test in preview branch: https://github.com/Nerogar/OneTrainer/tree/preview

Includes:

dxqb and others added 9 commits March 25, 2026 00:39
- Bump requirements: transformers 4.57.6 → 5.9, huggingface-hub 0.34.4 → 1.16.1
- Remove HF_HUB_DISABLE_XET workaround from startup scripts; Xet is stable in hub 1.16
- Remove _prepare_sub_modules / snapshot_download prefetching; hub 1.16 fetches lazily on demand
- Delete thread_safety.py and apply_thread_safe_forward calls; workaround for transformers#42673
  was fixed upstream in v5
- Replace _remove_added_embeddings_from_tokenizer (relied on internal Trie, removed in v5) with
  orig_tokenizer deep-copies stored at load time; model savers pass use_original_tokenizers=True
  to create_pipeline() so saved checkpoints use the unmodified tokenizer
- Switch ErnieModelLoader to AutoTokenizer; eliminates the tokenization-logger suppress workaround
- Suppress httpx INFO logs; hub 1.16 uses httpx internally and logs every HTTP request

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: dxqb <183307934+dxqb@users.noreply.github.com>
@dxqb dxqb added the preview merged in the preview branch label May 30, 2026
@AmaelG

AmaelG commented Jun 1, 2026

Copy link
Copy Markdown

Unsure if this is in scope for this PR, but I think it would be useful to expose an optional toggle to train Anima's llm_adapter.
From my testing with multi-concept training, training the llm adapter seems to improve concept adherence, converge faster, and reach lower loss/val.
I know tdrussel recommends to avoid training it, but in my experiments I have not seen obvious degradation of general knowledge, while the trained concepts became more reliable.

@Silvicultor

This comment was marked as resolved.

@dxqb

dxqb commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Unsure if this is in scope for this PR, but I think it would be useful to expose an optional toggle to train Anima's llm_adapter. From my testing with multi-concept training, training the llm adapter seems to improve concept adherence, converge faster, and reach lower loss/val. I know tdrussel recommends to avoid training it, but in my experiments I have not seen obvious degradation of general knowledge, while the trained concepts became more reliable.

training text components should be a thing of the past. it's always been a crutch for diffusion models that weren't very capable yet. So I'm hesitant to reintroduce this, with all the problems that come with it (such as having multiple learning rates and many more failure modes).
If there is strong community support that this is needed, maybe, but if even the model's creator advises against it...

dxqb added a commit to TheForgotten69/OneTrainer that referenced this pull request Jun 3, 2026
dxqb and others added 2 commits June 4, 2026 20:28
torch._dynamo.config overrides are thread-local. The existing call in
checkpointing_util runs in the main thread and is invisible to the
training thread spawned by the UI. This caused compiled optimizers
(e.g. AdamW_adv with compiled_optimizer=True) to hit the default
recompile_limit of 8 and abort with FailOnRecompileLimitHit when
training models with more than 8 distinct parameter shapes.

Fix: call init_compile() from GenericTrainer.__init__, which runs in
whichever thread/process owns training (UI thread, CLI main thread,
or torch.multiprocessing.spawn subprocess for multi-GPU).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dxqb

dxqb commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

torch._dynamo.exc.FailOnRecompileLimitHit: Hard failure due to fullgraph=True

fixed by #1495, merged into this PR

@dxqb dxqb mentioned this pull request Jun 4, 2026
@dxqb dxqb linked an issue Jun 4, 2026 that may be closed by this pull request
dxqb added a commit that referenced this pull request Jun 4, 2026
@FuouM

FuouM commented Jun 5, 2026

Copy link
Copy Markdown

Thank you for your great work!

I've been training Anima LoRAs in OneTrainer and using the same dataset/settings in Kohya sd-scripts. Training itself works fine, but LoRAs saved from OneTrainer (preview branch) don't load correctly in ComfyUI when paired with the standard single-checkpoint Anima model (anima-base-v1.0.safetensors). I believe this is due to the LoRA key naming on export.

Example key:

  • sd-scripts: lora_unet_blocks_0_self_attn_q_proj.lora_down.weight
  • OneTrainer: transformer.transformer_blocks.0.attn1.to_q.lora_down.weight

Other model types in OneTrainer already handle this via convert_*_lora.py key sets (e.g. Flux, HiDream, SD3). Anima's AnimaLoRASaver and AnimaLoRALoader both return None from _get_convert_key_sets(), so no conversion runs on save or load.

Diffusers has the inverse mapping in _convert_non_diffusers_anima_lora_to_diffusers() (lora_conversion_utils.py), which lines up with the rename table already documented in AnimaModel.py (diffusers_to_original()).

I prototyped the conversion script as below. It might be missing things as I haven't tested exhaustively yet:

# convert_anima_lora.py
from modules.util.convert.lora.convert_lora_util import LoraConversionKeySet


def __map_anima_blocks(parent: LoraConversionKeySet) -> list[LoraConversionKeySet]:
    return [LoraConversionKeySet(
        omi_prefix=f"blocks.{i}",
        diffusers_prefix=f"transformer_blocks.{i}",
        legacy_diffusers_prefix=f"blocks_{i}",
        parent=parent,
        next_omi_prefix=f"blocks.{i + 1}",
        next_diffusers_prefix=f"transformer_blocks.{i + 1}",
    ) for i in range(100)]


def __map_transformer_block(key_prefix: LoraConversionKeySet) -> list[LoraConversionKeySet]:
    mappings = [
        ("self_attn.q_proj", "attn1.to_q", "self_attn_q_proj"),
        ("self_attn.k_proj", "attn1.to_k", "self_attn_k_proj"),
        ("self_attn.v_proj", "attn1.to_v", "self_attn_v_proj"),
        ("self_attn.output_proj", "attn1.to_out.0", "self_attn_output_proj"),
        ("cross_attn.q_proj", "attn2.to_q", "cross_attn_q_proj"),
        ("cross_attn.k_proj", "attn2.to_k", "cross_attn_k_proj"),
        ("cross_attn.v_proj", "attn2.to_v", "cross_attn_v_proj"),
        ("cross_attn.output_proj", "attn2.to_out.0", "cross_attn_output_proj"),
        ("mlp.layer1", "ff.net.0.proj", "mlp_layer1"),
        ("mlp.layer2", "ff.net.2", "mlp_layer2"),
    ]

    return [
        LoraConversionKeySet(omi, diffusers, legacy_diffusers_prefix=legacy, parent=key_prefix)
        for omi, diffusers, legacy in mappings
    ]


def convert_anima_lora_key_sets() -> list[LoraConversionKeySet]:
    keys = []

    transformer = LoraConversionKeySet(
        "lora_unet",
        "transformer",
        legacy_diffusers_prefix="lora_unet",
    )

    for block_prefix in __map_anima_blocks(transformer):
        keys += __map_transformer_block(block_prefix)

    return keys

After converting, the output in ComfyUI seems to be affected by the LoRA as expected.

@dxqb

dxqb commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

I've been training Anima LoRAs in OneTrainer and using the same dataset/settings in Kohya sd-scripts. Training itself works fine, but LoRAs saved from OneTrainer (preview branch) don't load correctly in ComfyUI when paired with the standard single-checkpoint Anima model (anima-base-v1.0.safetensors). I believe this is due to the LoRA key naming on export.

Comfy-Org/ComfyUI#14182

@Silvicultor

Copy link
Copy Markdown

I've been training Anima LoRAs in OneTrainer and using the same dataset/settings in Kohya sd-scripts. Training itself works fine, but LoRAs saved from OneTrainer (preview branch) don't load correctly in ComfyUI when paired with the standard single-checkpoint Anima model (anima-base-v1.0.safetensors). I believe this is due to the LoRA key naming on export.

Comfy-Org/ComfyUI#14182

Doesn't look like Comfyanon wants to merge this one and also keep in mind that other inference tools (at least the ones that aren't built upon Diffusers) would also have to make the same change to their code. So I say OneTrainer should include the above proposed conversion logic into it's code and settle for the de-facto standard already established. I know OT wants to use Diffusers keys whenever possible for consistency, and that's perfectly fine for all the models like Flux or Qwen, their original repos being Diffusers format, but this is a special case. Initial Anima release wasn't in Diffusers format, so it's hard to argue for the Diffusers keys.

@dxqb

dxqb commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

I've been training Anima LoRAs in OneTrainer and using the same dataset/settings in Kohya sd-scripts. Training itself works fine, but LoRAs saved from OneTrainer (preview branch) don't load correctly in ComfyUI when paired with the standard single-checkpoint Anima model (anima-base-v1.0.safetensors). I believe this is due to the LoRA key naming on export.

Comfy-Org/ComfyUI#14182

Doesn't look like Comfyanon wants to merge this one

if that is the case, they should close the PR. As for the other points, we already had this discussion on Discord.

@dxqb

dxqb commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

By the way, this PR already includes conversion code: https://github.com/dxqb/OneTrainer/blob/03b7156c49bac5c8f5a8f13de259357f94047d75/modules/model/AnimaModel.py#L31

It's just not used for LoRAs currently (only for full finetunes), because this was the consistent and accepted way to do things for all other models. If inference tools want to change that now, they should make that clear (by closing the PR, for example)

@dxqb dxqb changed the title Anima New model: Anima Jun 6, 2026
dxqb added a commit that referenced this pull request Jun 14, 2026
dxqb added 7 commits June 18, 2026 01:27
# Conflicts:
#	modules/util/compile_util.py
Mirrors upstream commit 75a44d2, which converted the rest of the
codebase from the trailing factory.register() call to the @factory.register
decorator form.
# Conflicts:
#	modules/modelLoader/mixin/HFModelLoaderMixin.py
# Conflicts:
#	modules/modelLoader/mixin/HFModelLoaderMixin.py
#	requirements-global.txt
dxqb added a commit that referenced this pull request Jun 19, 2026
Audit fixes applied during merge:
- ModelType.py: register ANIMA in _MODEL_PARTS and supported_training_methods()
- AnimaModel.py: add missing release() abstract method
- BaseAnimaSetup.py: per-component checkpointing, 3/4-arg autocast helpers, release()-based prepare_text_caching
- Anima{FineTune,LoRA}Setup.py: latent_caching -> image_caching/text_caching
- BaseModelTabView/BaseTrainingTabView/TopBarController/BaseConvertModelUIView: wire up Anima UI
- test/run_lora_presets.sh: add Anima LoRA preset
@Silvicultor

Copy link
Copy Markdown

Did a lot of testing with the current version of the PR in the last 2 weeks. Overall Anima runs very good and stable in OT.
What I tested:
-Normal LoRA training works
-Anima LoKr works
-Masked training works fine aside from normalizing mask area loss can cause NaNs sometimes
-Torch compile, transformer blocks + optimizer, fully functional after workaround code was added
-Output quality of the LoRAs is similar to what other training tools produce (e. g. SD scripts).

So hoping to see this in the master branch soon! The only thing from my perspective that is left to overcome is the LoRA key issue, but I know it’s being discussed right now. Until then I convert manually to Comfy format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview merged in the preview branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Anima support

4 participants