New model: Anima#1487
Conversation
- Bump requirements: transformers 4.57.6 → 5.9, huggingface-hub 0.34.4 → 1.16.1 - Remove HF_HUB_DISABLE_XET workaround from startup scripts; Xet is stable in hub 1.16 - Remove _prepare_sub_modules / snapshot_download prefetching; hub 1.16 fetches lazily on demand - Delete thread_safety.py and apply_thread_safe_forward calls; workaround for transformers#42673 was fixed upstream in v5 - Replace _remove_added_embeddings_from_tokenizer (relied on internal Trie, removed in v5) with orig_tokenizer deep-copies stored at load time; model savers pass use_original_tokenizers=True to create_pipeline() so saved checkpoints use the unmodified tokenizer - Switch ErnieModelLoader to AutoTokenizer; eliminates the tokenization-logger suppress workaround - Suppress httpx INFO logs; hub 1.16 uses httpx internally and logs every HTTP request Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: dxqb <183307934+dxqb@users.noreply.github.com>
|
Unsure if this is in scope for this PR, but I think it would be useful to expose an optional toggle to train Anima's llm_adapter. |
This comment was marked as resolved.
This comment was marked as resolved.
training text components should be a thing of the past. it's always been a crutch for diffusion models that weren't very capable yet. So I'm hesitant to reintroduce this, with all the problems that come with it (such as having multiple learning rates and many more failure modes). |
torch._dynamo.config overrides are thread-local. The existing call in checkpointing_util runs in the main thread and is invisible to the training thread spawned by the UI. This caused compiled optimizers (e.g. AdamW_adv with compiled_optimizer=True) to hit the default recompile_limit of 8 and abort with FailOnRecompileLimitHit when training models with more than 8 distinct parameter shapes. Fix: call init_compile() from GenericTrainer.__init__, which runs in whichever thread/process owns training (UI thread, CLI main thread, or torch.multiprocessing.spawn subprocess for multi-GPU). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fixed by #1495, merged into this PR |
|
Thank you for your great work! I've been training Anima LoRAs in OneTrainer and using the same dataset/settings in Kohya sd-scripts. Training itself works fine, but LoRAs saved from OneTrainer (preview branch) don't load correctly in ComfyUI when paired with the standard single-checkpoint Anima model (anima-base-v1.0.safetensors). I believe this is due to the LoRA key naming on export. Example key:
Other model types in OneTrainer already handle this via Diffusers has the inverse mapping in I prototyped the conversion script as below. It might be missing things as I haven't tested exhaustively yet: # convert_anima_lora.py
from modules.util.convert.lora.convert_lora_util import LoraConversionKeySet
def __map_anima_blocks(parent: LoraConversionKeySet) -> list[LoraConversionKeySet]:
return [LoraConversionKeySet(
omi_prefix=f"blocks.{i}",
diffusers_prefix=f"transformer_blocks.{i}",
legacy_diffusers_prefix=f"blocks_{i}",
parent=parent,
next_omi_prefix=f"blocks.{i + 1}",
next_diffusers_prefix=f"transformer_blocks.{i + 1}",
) for i in range(100)]
def __map_transformer_block(key_prefix: LoraConversionKeySet) -> list[LoraConversionKeySet]:
mappings = [
("self_attn.q_proj", "attn1.to_q", "self_attn_q_proj"),
("self_attn.k_proj", "attn1.to_k", "self_attn_k_proj"),
("self_attn.v_proj", "attn1.to_v", "self_attn_v_proj"),
("self_attn.output_proj", "attn1.to_out.0", "self_attn_output_proj"),
("cross_attn.q_proj", "attn2.to_q", "cross_attn_q_proj"),
("cross_attn.k_proj", "attn2.to_k", "cross_attn_k_proj"),
("cross_attn.v_proj", "attn2.to_v", "cross_attn_v_proj"),
("cross_attn.output_proj", "attn2.to_out.0", "cross_attn_output_proj"),
("mlp.layer1", "ff.net.0.proj", "mlp_layer1"),
("mlp.layer2", "ff.net.2", "mlp_layer2"),
]
return [
LoraConversionKeySet(omi, diffusers, legacy_diffusers_prefix=legacy, parent=key_prefix)
for omi, diffusers, legacy in mappings
]
def convert_anima_lora_key_sets() -> list[LoraConversionKeySet]:
keys = []
transformer = LoraConversionKeySet(
"lora_unet",
"transformer",
legacy_diffusers_prefix="lora_unet",
)
for block_prefix in __map_anima_blocks(transformer):
keys += __map_transformer_block(block_prefix)
return keysAfter converting, the output in ComfyUI seems to be affected by the LoRA as expected. |
|
Doesn't look like Comfyanon wants to merge this one and also keep in mind that other inference tools (at least the ones that aren't built upon Diffusers) would also have to make the same change to their code. So I say OneTrainer should include the above proposed conversion logic into it's code and settle for the de-facto standard already established. I know OT wants to use Diffusers keys whenever possible for consistency, and that's perfectly fine for all the models like Flux or Qwen, their original repos being Diffusers format, but this is a special case. Initial Anima release wasn't in Diffusers format, so it's hard to argue for the Diffusers keys. |
if that is the case, they should close the PR. As for the other points, we already had this discussion on Discord. |
|
By the way, this PR already includes conversion code: https://github.com/dxqb/OneTrainer/blob/03b7156c49bac5c8f5a8f13de259357f94047d75/modules/model/AnimaModel.py#L31 It's just not used for LoRAs currently (only for full finetunes), because this was the consistent and accepted way to do things for all other models. If inference tools want to change that now, they should make that clear (by closing the PR, for example) |
# Conflicts: # requirements-rocm.txt
# Conflicts: # modules/util/compile_util.py
Mirrors upstream commit 75a44d2, which converted the rest of the codebase from the trailing factory.register() call to the @factory.register decorator form.
# Conflicts: # modules/modelLoader/mixin/HFModelLoaderMixin.py
# Conflicts: # modules/modelLoader/mixin/HFModelLoaderMixin.py # requirements-global.txt
Audit fixes applied during merge:
- ModelType.py: register ANIMA in _MODEL_PARTS and supported_training_methods()
- AnimaModel.py: add missing release() abstract method
- BaseAnimaSetup.py: per-component checkpointing, 3/4-arg autocast helpers, release()-based prepare_text_caching
- Anima{FineTune,LoRA}Setup.py: latent_caching -> image_caching/text_caching
- BaseModelTabView/BaseTrainingTabView/TopBarController/BaseConvertModelUIView: wire up Anima UI
- test/run_lora_presets.sh: add Anima LoRA preset
|
Did a lot of testing with the current version of the PR in the last 2 weeks. Overall Anima runs very good and stable in OT. So hoping to see this in the master branch soon! The only thing from my perspective that is left to overcome is the LoRA key issue, but I know it’s being discussed right now. Until then I convert manually to Comfy format. |
Test in preview branch: https://github.com/Nerogar/OneTrainer/tree/preview
Includes: